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Abstract 

A subjective expected utility policy making centre managing complex, dynamic systems needs to draw 
on the expertise of a variety of disparate panels of experts and integrate this information coherently. 
To achieve this, diverse supporting probabilistic models need to be networked together, the output 
of one model providing the input to the next. In this paper we provide a technology for designing 
an integrating decision support system and to enable the centre to explore and compare the efficacy 
of different candidate policies. We develop a formal statistical methodology to underpin this tool. In 
particular, we derive sufficient conditions that ensure inference remains coherent before and after relevant 
evidence is accommodated into the system. The methodology is illustrated throughout using examples 
drawn from two decision support systems: one designed for nuclear emergency crisis management and 
the other to support policy makers in addressing the complex challenges of food poverty in the UK. 

Keywords: Bayesian multi-agent models, causality, coherence, decision support, graphical models, likeli¬ 
hood separation. 


1 Introduction 

Using a probability model for decision support for a single user has many advantages: as well as ensuring 
coherence, and hence transparency, recent computational advances have enabled such support to be fast, 
even when large amounts of structured information needs to be accommodated. However, the 21st century 
has seen the advent of massive models which need to be networked together to provide appropriate decision 
support in increasingly complex scenarios (see e.g. Figure [l]). Each component of such a network is itself 
often informed by huge data sets. In these contexts, users are typically decision centres where both users and 
experts are teams rather than individuals. Such centres often need a tool that can draw together inferences in 
this plural environment and integrate together expert judgements coming from a number of different panels 
of experts where each panel is supported by their own, sometimes very complex, models. 

Although, increasingly, many of these component expert panels are supported by probabilistic models, 
it is natural but usually inappropriate to commission a single comprehensive probabilistic model over the 
whole composite, except in the case of relatively small systems. Such an overarching probability model would 
be huge and, perhaps more critically, unless there existed shared structural assumptions, no single centre 
could realistically ‘own’ all the statements about the full joint distribution of the hundreds of thousands 
of diverse random variables in the aggregate system. Furthermore, and from a more practical perspective, 
even if it were possible to build such a system, typically, in the types of domain we address in this paper, 
the different component systems are being constantly revised by the relevant panels to accommodate new 
understanding, science and data. Any overarching probabilistic model would therefore quickly become 
obsolete: the judgements it embodies would no longer reflect current understanding. 
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Figure 1: A plausible network of models for a decision support system in nuclear emergency management (from |Leon el li~and| 
|Smitli <!2013j)), where nodes equally colored are associated to the same domain of expertise. 


In this paper we argue that what is often needed instead of a single, overarching probability model is an 
integrating decision support systems (IDSS). This would process only carefully selected probabilistic outputs 
from each contributing component - those statistics on which expected utilities would need to depend. It 
would then combine these dynamically evolving expert judgements together appropriately to provide the 
basis for a coherent assessment. In particular, benchmark numerical efficacy scores for each candidate policy 
would then be calculated from the individual components’ outputs and thus help determine the efficacy of 
each of the different options considered by the decision centre. 

In fact, theory and methods for breaking up probability models into autonomous subcomponents are 


already well developed by agent-based modellers (see e.g. Xiang, 2002), albeit when addressing automated 
decision-making rather than the anthropomorphic decision-making we address here. Perhaps even more rele¬ 


vant to this development is the seminal paper by Mahoney and Laskey (1996) who apply general engineering 


principles to develop protocols for the coherent integration of evidence over a diverse panel of experts inform¬ 
ing a problem. Although they quickly focus their ideas down on to the specific Bayesian network (BN) model 
class, their framework is nevertheless a valuable one and is currently being exploited within the context of 


object-oriented BNs (OOBNs) by a number of authors (e.g. Johnson et al. 2014 Johnson and Mengersen 


2012 ). 


For the purposes of this paper we return to the more general setting described initially by |Mahoney and| 


Laskey (1996). Here we apply analogous principles to even larger problems that they envisaged, developing 


a sound statistical methodology that justifies the use of an IDSS. The challenges faced by chains of panels of 
experts using data and models to deliver probabilistic beliefs has been noted in French (2011), who argued 


that very little had been said on the issue so far. Some early work on this type of problem was suggested for 
nuclear emergency management in French et al. (1991), however, as for as we are aware, this paper is the 


first to approach these types of problem from a methodological and statistical viewpoint. 

To highlight the relevance of an IDSS to support policymakers in current, complex domains we start by 
discussing two applications where we have observed the necessity of knitting together different models into 
a coherent whole, which together provoked the methodological development within this paper. 


1.1 Two systems we have appraised 

1.1.1 RODOS and nuclear emergency management 

In 1986 an explosion at one of the reactors of the Chernobyl nuclear power plant released a radioactive 
plume into the environment contaminating large areas of the former Soviet Union. To protect people and 
food stocks, measures were taken by the governments of the affected countries. Further different and often 
conflicting responses were taken by many European countries after the accident, confusing the public, and 


leading to an ineffective implementation of countermeasures (Papamichail and French 2013 Wallc and Turoff 


2008). It was therefore quickly recognised that a comprehensive response to nuclear emergencies within the 


European community was needed. To achieve this, a common decision support system (DSS) for off-site 
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Strategy 

Number 

relocated 

(thousands) 

Number 

protected by 
other means 

(thousands) 

Estimated 

number of 

fatal cancers 

averted 

Estimated 

number of 

hereditary 

effects 

averted 

Cost 

(billions of 
roubles) 

SL2_2 

706 

0 

3200 

500 

28 

SL2_10 

160 

546 

1700 

260 

17 

SL2_20 

20 

686 

650 

100 

15 

SL2_40 

3 

703 

380 

60 

14 


Table 1: A sample RODOS output breaking down the score of various strategies for the different factors deemed relevant. 


emergency management was commissioned. Several institutes in Europe then started the development of 
the Real-time On-line DecisiOn Support system (RODOS) for nuclear emergencies, including uncertainty 
handling methodologies which would provide consistent predictions unperturbed by national boundaries 


(Ehrhardt et al. 1993). One author of this paper was heavily involved in this development. 


Alongside the development of RODOS, in the early 1990s the International Chernobyl Project began to 
explore the factors that drove decision making about which protective measures to adopt after the Chernobyl 
accident. Since, at the time, many different parties and institutions were involved in this decision making 
process, the study was organised through conducting five decision conferences, where simple multicriteria 


decision analysis models were used to explore the preferences and the beliefs of the different parties (French 


et al., 2009). The analyses performed during these meetings clearly showed that factors implemented in 


cost-benefit analyses, usually performed in nuclear emergency management, could not fully describe the 
preferential structure of the group. It was therefore decided that multicriteria methods had to be included in 
any operational DSS like RODOS designed for nuclear emergency response. Such a DSS would then combine 
scientific knowledge about the likelihood of different events with the value judgements about these to rank 
different agreed available policies and both facilitate the exploration and create a deeper understanding of 
the problem at hand. A sample output from RODOS is shown in Table[l]where high scoring countermeasures 
are detailed together with a breakdown of the impact of the policies on the relevant factors identified through 
decision conferencing. This type of supporting capabilities, presented in this and other more refined forms 
(Papamichail and French 2003 2005), were found to be vital for integrated decision support. This is because 
empirical research has shown that decision makers do not accept the suggestions of a system which does not 


provide a rationale for the outputs it produces, even if these outputs happen to be accurate (Papamichail 


and French 2013). 


An evaluation of the potential unfolding trajectories of an emergency was achieved by pasting together 
the outputs of a suite of different subsystems (or modules). Each such module provided estimates and 
forecasts for a different aspect of the emergency. At that time there was an acute awareness that uncertainty 
evaluations had a critical role to play in such systems (Smith et al. 1997). However it was also the case that 


the formal accommodation of such uncertainties could not be made homogeneously. The system needed to 
use a variety of deterministic and stochastic methodologies to guide the estimation and the forecasting of the 


various quantities relevant to the domain under study (Ehrhardt 1997 Smith et al. 1997). A few modules 


were statistical in nature, but others were guided by fuzzy logic and many others were entirely deterministic. 

Once uncertainty management for these networks of systems became acknowledged as central to the effec¬ 
tive implementation of the composite system, fully probabilistic component modules began to be developed 
for communicating both the relevant panel’s forecasts and their associated uncertainties. There were several 
examples of these. For instance, a source term module estimating the likelihood of a release of contamina¬ 
tion from the plant was built (French 19951. Others included atmospheric diffusion and deposition models 
describing the spread of contamination (Smith and Papamichail 1999 De and Faria 2011). Additional 


subsystems modelled the effect that the spread might have because of the exposure of humans, animals 
and plants (Richter et al. 2002 Zheng et al. 2009). However these developments were patchy. This and 
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the significant extra computational costs meant that any nuanced full integration of uncertainty evaluations 
associated with the whole system was severely inhibited. 

It was not only the lack of technology and the heterogeneity of uncertainty outputs of the different 
component models that challenged a comprehensive and proper accommodation of uncertainty. It was 
recognised early on that, even if all models could produce faithful and consistent representations of the 
outputs’ uncertainty for each module, it was not at all clear at that time what formulae to use to combine 
these judgements and what background information justified that use of these forecasts. The fact that 
the component modules were designed to work independently of one another and that the exact judgements 
encoded within them were often contributed by very differently informed groups of scientists made this formal 
combination especially challenging. Because of these computational and methodological constraints, the 
modules’ outputs ended up being collated together in a simple, essentially deterministic way by transferring 
from one module to another a single vector of means about what might happen and hence effectively ignoring 
any uncertainty associated with them. As statisticians, we appreciate how such a naive method can be very 


misleading (see e.g. Leonelli and Smith 2013 2015). 

Since that time the need for integrated support tools addressing other threats from uncertain environ¬ 
ments has been recognised. Perhaps one of the most critical of these concerns food security. 


1.1.2 UK food security 

Food security, once thought to be a problem confined to low-income countries, is increasingly being recognised 


as a matter of concern in the UK ( 

DEFRA 

1009 

Elliott 

2014 

Lambie-Mumford and Dowler 

2014 

Loopstra 

et al. 

20151, USA ( 

USDA 

2012), Canada 

(Loopstra and Tarasuk 

2012 

1 and other wealthy nations. At a 


country level, all these nations appear to be the most food secure in the world (Economist 2015). But at 
household level, the story is quite different. Enabling its citizens to have access at all times to sufficient 
nutritious food for an active and healthy life is a key responsibility of governments, but achieving this is not 
straightforward. At first glance, UK household food security may seem to be a simple case of demand and 
supply. However on closer inspection the system is shown to be highly complex, especially from the point of 


view of policymakers, who endeavour to intervene on the system to produce specific responses (Morris et al. 


to 

o 

o 

o 

Drewnowski and Specter 

2004 

Lambie-Mumford and Dowler 

2015 


The food system is global, multifaceted and influenced by a huge number of public and private actions 
and uncontrolled factors such as weather and climate. This leads to a great deal of uncertainty about how 
any policy decision or strategy will play out. We are now taking up this new challenge. Since the domain 
is still developing we have had the opportunity to develop an overarching methodology to manage this 
system through the integration of diverse probabilistic systems and through this a proper management of 
uncertainty. We are currently working with Warwickshire County Council to develop an IDSS to support 
decision-making around household-level food poverty. During this process we have recognized that a DSS 
to support policymakers in this new domain of application would need to have many features in common 
with RODOS, whilst embedding a complete uncertainty handling both within and between the constituent 
modules each informed by different panels of experts. 

Firstly, the system is multifaceted and heterogeneous and requires as inputs the judgements from different 
panels of experts in diverse disciplines including insight about factors elevating the risk to food security of 
households (from sociologists and local authorities), judgements about the effects of malnutrition on the 
population (from doctors and nutritionists), estimates of the availability of food in supermarkets and other 
outlets (delivered by supply chain experts) and forecasts of the yield of crops in a particular season (by crop 
experts and official statistics). Unless properly structured, this expert information is liable to conflict where 
two or more panels can sometimes deliver contradicting expert judgements about a shared random variable. 
If the system admits such contradictions then this can obviously threaten the coherence of the system as 
a whole so that its outputs become compromised. For instance, both estimates of cost of oil and weather 
forecasts affect food production, food transport and the ability of households to access food. If these latter 
variables are under the jurisdiction of different panels, any integrating system should surely embed common 
estimates of distributions over the cost of oil and weather forecasts and not contradicting ones. Otherwise 
how could it ever be coherent and justifiable? 
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Secondly, as was also true for the RODOS system described above, information that is available to inform 
each element of the system is patchy. Some parts of the system, for instance food production and household 
demography, are well modelled and informed by many different datasets and a great deal is known about 
the impacts of malnutrition (Elia et al. 2010). On the other hand, other subsystems are not necessarily 


so well informed. For instance, there is considerable uncertainty about world food availability because food 
imports, exports and prices are highly variable and affected by a large number of factors such as weather, 
wars, international relations and even, unexpectedly, by another country’s internal financial regulations 
(Lagi et al. 2012|a ). Lastly, it is clear that any IDSS supporting this domain needs to be dynamic, not only 
because the food system is highly seasonal but also because there are many time steps for the consequences 
of a chosen policy as well as shocks experienced by the system to unfold. 

There is now a wealth of qualitative information in the sociological literature of the causes and impacts 


of food security at household level in the UK (Dowler et al. 2001 Holmes 2008 Field et al. 2014 Dowler 


2014). This provides a sound basis for modelling the qualitative structure of the system. Within the UK, 


Dowler et al. (2001) describes the strategies used by families to negotiate poor access to food resources. It 


has now been realised that strategies more subtle than taxing nutrient-poor food and subsidising nutrient 


rich foods are required to effect change in purchasing habits (Darmon et al. 

2004 

2014 

Dowler 

2010 

Elia 

et al. 

2010 

Friel and Conlon 

2004 

Holmes 

OO 

o 

o 

CM 

McBride and Purcell > 2 

>014 

). Whilst the UK does not 


measure household-level food security, despite calls to do so for the last 20 years, culturally similar countries, 


the USA and Canada, regularly deploy an 18-question survey called HFSS (USDA 


2012). This survey has 


also been implemented in small-scale studies in the UK (Holmes 2008 Pilgrim et al. 


2012) providing insight 


into the extent to which USA and Canada findings may be used to provide a first approximation for the 
food poverty crisis now unfolding within the UK. 

Many factors affecting household level risks for food security (disposable income, socio-economic status, 
social security levels, employment, housing and energy costs, access to credit, and so on) can be found for 
the UK in official statistics, at national, regional, county and sub-county level (lower layer super output 
areas (LSOA) and middle layer super output areas (MSOA)). For the Warwickshire decision-makers, MSOA 
level is the most appropriate, and a repository of data has been collated for use in a Warwickshire IDSS. But 
where data are not available at this level, they can be modelled using the spatial granularity that is available 
or estimated from it by structured elicitation of expert opinion. On the food supply side, there are some 
localities which are food deserts: so called because there local population is predominantly in low-income 
households and the opportunity for profit is insufficient to incentivise supermarkets to site full-service stores. 
In such localities, households must either purchase food locally, typically at small convenience store or finance 
transport to reach the larger stores. The small local stores often stock a smaller range of goods and often a 


much smaller range of fruits and vegetables often at a significantly higher price (Donkin et al. 1999). The 


number of regions in the UK which are food deserts is likely to increase as UK supermarket giants close 


stores to address their falling profits (Joyce et al. 2014). 


Whilst numerous DSSs exist to model aspects of the systems such as for supermarket siting and to inform 


(Decuyper et al. 

2014 

Efendigil et al. 2009 

Hernandez and Bennison 

LO 

O 

o 

o 

Kuo 


et al., 20021, the complex problem of developing a shared methodology that can guide the accommodation of 


the wide range of expertise and provide the information required to evaluate the efficacy of various policies 
designed to address food poverty issues has yet to be attempted. This and other applications, like the needs 
of the RODOS project, have motivated the methodological developments we present below. 


1.2 Some examples of established probabilistic composites 

Although, for example, in food security we are only now in the process of building a suite of fully probabilistic 
modules to populate the IDSS and in nuclear emergency management some parts of the system were still 
deterministic, for the purposes of this paper we henceforth assume that we are in the position where there 
already exist probabilistic models that will describe the different components that our integrating system 
will network together. The type of probabilistic DSSs which will form the components of our integrated 
system are now widely available in a variety of forms. Two of the most common probabilistic models for 
multivariate systems designed to be used by a single agent or panel in non-dynamic environments are BNs 
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and influence diagrams. Both of these frameworks, whilst still being refined, are now well developed and 


have been applied to ever larger systems (Aguilera et al. 2011 Gomez 2004 Cowell et al. 2011 Molina 


et al. 2010). 


The component domains now modelled are often complex, dynamic and can themselves collate diverse 
inputs. Most recently, significant methodological developments using, for example, object-oriented code 
have enabled these models to become progressively more expressive, efficient and applicable as components 


within the types of dynamic environments we address here (Roller and Pfeffer 1997 Roller and Lerner 2001 


Murphy, 

2002 

Nicholson and Flores 

2011) 


However, BNs and their dynamic analogues are not the only 
framework around which probabilistic models have been built. Hierarchical Bayesian models of large-scale 
temporal spatial processes modelling, for example, the development of epidemics of one sort or another are 


well established (ijBest et al. 

2005 

Jewell et al. 2009 

McRinley et al. 

2014 

). Other modelling tools that 

have also recently appeared are those based on probability trees ( 

Thwaites et al. 

2010 

Smith 

2010 

). These 


have been successfully applied in a number of applications where the potential development of a scenario is 
heterogeneous (Smith and Anderson 2008 Smith 2010). 

Another class of probabilistic models used in large, complex systems are based on probabilistic emulators 


(Rennedy and O’Hagan 2001 O’Hagan et al. 20061. These have, in particular, been widely used for climate 
and environmental modelling. Such methods are particularly useful when the underlying science is encoded 
using deterministic simulators based, for example, on collections of deterministic differential equations. A few 
costly runs from such massive simulators are taken from certain designed scenarios and these are then used to 
frame judgements across the whole space of interest using either Gaussian processes anchored at the results 


of the runs (Conti et al. 2009 Sanso et al. 2008) or Bayes Linear methods which use plausible continuity 


assumptions to interpolate expectations and their associated uncertainties over the whole domain of interest 
(Williamson and Goldstein 2011 Williamson et al., 2012). These are all able to provide probabilistic outputs 
and so, in particular, the various moments of critical features of the problem we will later demonstrate we 
will need for our IDSS. 

One element all these types of model have in common is that they are not just arbitrary probability 
models. As well as being able to deliver various conditional probability statements if queried, all are also 
able to deliver a rationale that lies behind the delivered numerical evaluations. This rationale can of course 
take one or several different forms: an underlying scientific justification, based on experimental, survey or 
observational evidence, simulation runs from detailed complex systems, carefully elicited judgements from 
respected experts in the field and so on. But, whatever form the justification might take, this therefore 
means that if challenged - for example by some external auditor or regulator - a robust defence of the 
probability statements can be given. For example, forensic DSSs concerning DNA evidence are based on 
established scientific theory and a plausible dependence model. Economic forecasts will be based on defensible 
models of the economy together with observational data, climate change forecasts on simulator runs coupled 
with emulators that interpolate these results more widely under plausible smoothness assumptions, BNs of 
ecological systems on carefully elicited structural relationships between its variables and their conditional 
probability tables, also usually themselves supported by observational and scientific data. These types of 
explanation are a vital component for any DSS making it a compelling, practical tool. 

The point is therefore that for any sort of decision support it is not enough to demand only coherence in 
its formal sense. It is also essential to be able to provide - if called upon to do so - a narrative which defends 
the probability models and is plausible enough to encourage a third party scrutinising the model to at least 


suspend their beliefs and accept that the probability statements are plausible (Smith 1996). Statistically 


motivated systems with this property are sometimes called internalist by philosophers: see e.g. [Peterson 
(2009). All probabilistic models in this domain we are aware of are not simply black boxes but have this 
additional property. So this facility will also henceforth be assumed for any contributing component model. 

The diverse collection of different types of probability models - each carrying its own supporting narrative 
- are now available to support single panels of experts within a composite system. Some of these models may 
be very large, encompassing long vectors of explanatory observables and modelling complex relationships 
between them. Others might be entirely subjective and reflect the expert judgements of the panel in a 
probabilistic form. But in all cases, it is reasonable to request that a panel delivers a collection of outputs - 


6 

































































































typically various expectations of functions of explanatory conditioning variables - together with the ability to 
supply a supporting narrative of the type discussed above. We will demonstrate that our methodology then 
determines, not only what these summary outputs should be, but also describes how they can be processed 
to provide a decision centre with a coherent and global picture of the process as a whole. 


1.3 What an IDSS does 


Given a suite of different models, like the ones reviewed in the previous section, an overarching probabilistic 
methodology needs to enable us to accommodate the diversity of information and its intrinsic uncertainty 
coming from these submodules into the system. The motivation of this paper is to determine when and 
how a supporting narrative can be composed around the component narratives discussed above that can 
then be used to explain to any outside auditor the rationale behind the choice of decision taken in both 
a formally justifiable and plausible way. To address this issue we start by looking at a small system and 
then gradually increase the size of applications so that the ideas can also be applied to the large domains of 
nuclear emergency management and UK food security applications reviewed in Section 0 

Under certain hypotheses, and given a variety of structural assumptions, we are able to develop a method¬ 
ology, similar to a standard Bayesian one, where decisions can be guaranteed to be coherent, i.e. expected 
utility maximising for some utility and probability distribution derived from individual but connected suites 
of models of the types discussed above, and defensible enough to support a composite narrative in a sense we 
will define precisely in Section [3j The derived IDSS will then be able to fully support a subjective expected 
utility maximising crisis centre or policy making forum in a justifiable way and help it draw together all the 
evidence distributed across different sources whilst properly taking into account the strength of the evidence 
on which these judgements are made. We demonstrate in particular that these properties are often implicit 
when, in a formal sense, the system is casual: an assumption we later argue is implicitly made when building 
real models. 

The methodology developed in this paper then provides not only a framework for faithfully encoding 
all usable and informed expert judgements and data leading to scores for competing candidate policies but 
also an overarching narrative explaining the derivation of these scores. This narrative will be composed of a 
sequence of sub-narratives delivered by the particular relevant panels of experts. So, in this sense, it is based 
on best evidence. This then provides a platform around which a decision centre can discuss the evidence 
supporting one policy against another. On the basis of this platform, assessments can be discussed and 
revised where deemed necessary: an interactive capability commonly recognised vital to any such IDSS. 

We show that sufficient conditions under which such an interactive IDSS can be built are ones that lead 
to the system being distributive. By this we mean that it is coherent for each panel to autonomously focus 
only on its own held of expertise and update its beliefs about the domain under its jurisdiction when new 
evidence is introduced in the IDSS. We later show that we can often attain this property provided that the 
IDSS has an appropriate protocol guiding the nature and quality of the data input by each of its component 
systems. This distributivity property gives the added benefit that the expected utilities it needs can typically 
be calculated very quickly using algorithms analogous to fast propagation algorithms used in BNs. These 
algorithms are customised to an overarching agreed dependence structure across the system as a whole and 


have recently been discussed in Leonelli and Smith (2015). They then constitute the inferential engine of an 


IDSS and make their outputs not only formal and transparent but also feasible to implement. 

We start setting up this formal framework for the combination of panels’ judgements and subsystems in 
Section [2] and introduce a toy example to illustrate the challenges and opportunities presented by even very 
small networks of systems. We then prove in Section [3] some key results that enable us to address these new 
inferential challenges in more complex settings. In particular, we derive a set of conditions which ensure an 
IDSS is coherent and a faithful expression of this composite process in a sense to be made explicit later. We 
also show how and when such a system can legitimately devolve judgements to domain experts so that the 
IDSS remains distributed and so feasible as well as sound. In these settings both estimation and validation 
can be performed locally by the individual panels of experts contributing to the composite inference. In 
Sections [4] and [5] we proceed to illustrate our methodologies as they might apply to a range of different 
overarching structures, including dynamic ones. 
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2 Networks of probabilistic expert systems 


2.1 Some technical structure 


Assume that the different components of a network of processes are evaluated and overseen by m G N 
different panels of domain experts, {Gi,..., G m }, and let [m] = {1,..., m}. Examples of such diverse panels 
operating various components of a network have already been given in Section 0 Let d G D denote a 
control strategy or policy a decision centre might adopt from a class D of available policies. We envisage 
that a large vector of random variables measures various features of an unfolding future. Henceforth denote 
the vector of these random vectors by Y = (Y) T ) ie [ m ], where Y) takes values in Y i(d), i G [to], rfsD. Often 
these random vectors will be indexed by time. Panel Gi will be responsible for the output vector Y), i £ [to]. 
The implicit (albeit virtual) owner of beliefs expressed in the system will be henceforth referred to as the 
supraBayesian (SB). 

For each d G D, each panel Gi will be asked to give the SB various summaries of the probability 
distribution of the subvector Yi of Y over which Gi has oversight, conditional on certain measurable functions 
taking values {Aj(Y’) : A; £ AJ where A* could be null. So for example A* could be the set of different 
possible combinations of levels of the covariates on which the vector Yi might depend. The SB will need 
to process these necessary probabilistic features provided by the different panels. She will then use these 
to calculate various statistics of a potential decision centre’s reward vector, some function of Y. Using 
these features, the SB will then calculate her expected utility scores {17(d) : d G O} for each policy to which 
she might commit. Such scores will obviously be functions of the centre’s utility U(y,d), where y is an 
instantiation of Y, drawn from some class of utilities U, with any structural modelling assumptions and the 
probability statements provided by the individual panels. Note that, as illustrated in the RODOS example 
above, such utility functions will usually have several attributes as its arguments. With these scores the 
decision centre will then be able to identify a decision d* G D with the associated highest score U(d*) 
together with other high scoring decisions for further comparison and discussion. 

Ideally, for an IDSS to be defensible, it should endeavour to accommodate probabilistic information 
provided only by the most well-informed experts. For this to happen, it would make sense for different 
choices of decisions d G D for each panel only to donate probabilistic summaries associated with their own 
particular domain of expertise, and not their beliefs about the whole vector of components Y. Some 
conditions that lead to the necessity of this from a formal coherence viewpoint are given in Section [3] 

Assume then that Gi will be able, possibly with the use of their own probabilistic DSS, to perform 
probabilistic inference over its own particular domain of responsibility. Typically, in practice, the panel’s 
chosen system will support decisions over much more complex scenarios than those concerned in the specific 
crisis management or policy forum our IDSS might be designed to inform. For this task, as discussed in 
more detail later, the integrating system will usually only need panels to deliver certain distributional 
summaries of the measurements Y under each chosen decision. So for example a production module for a 
food security IDSS may be able to predict yield of a particular produce by farm. However to inform the flow 
of food in the system it would need only give its aggregate forecasts associated with produce as it arrives at 
market. The SB will then use these summaries as her own, in a way we will outline later in Section [5] 

So assume that, for i G [to], G, will be required to deliver to the IDSS belief summaries denoted by 

Hf — {nf (Aj, d) : d G D, A, G AJ . 


These summaries will typically be various expectations of certain functions of Y) conditional on the values 
in A i taken by some subvector of Y for each d G D. So, for instance, in a BN these might be the expected 
probabilities in a set of conditional probability tables and A * = x je n Yj(d), where the parents of Yi are 
{Yj : j e nj and C [z—1]. Note however that A j does not necessarily need to be a product space. For 
example in Section 4.2 we discuss an IDSS for which our methods still apply but whose asymmetric structure 
does not admit such tabular form. 

We show below that the belief summaries can be determined once an overarching dependence frame¬ 
work has been agreed by all panellists in the system. It will contain only those quantities G, will be required 
to deliver to the IDSS so that the IDSS is able to calculate its expected utility scores: under quite general 



conditions often turning out to needing only to be short vectors of expectations of certain functions condi¬ 
tional on the observations of certain events. This property - defined for the specific purpose of the IDSS - is 
central to being able to define a feasible IDSS even for large dynamic systems. 

Henceforth we assume that all panellists make their inferences in a parametric or semi-parametric setting 
where Y is parametrised by 0 = (0i) ie , m 1 £ 0(d), d £ D. Here the parameter vector 9, parametrises Gf s 
relevant sample distributions, i £ [m]. This may be infinite dimensional. When the parameter space of the 
system can be written as a product space, 0(d) = x i6 [ m |0 ,;(d), where 0;(d) is Gf s parameter space, we 
say that panels are variationally independent (see Dawid 2001). We henceforth assume this property holds. 
Were this not so then it would be necessary for a panel to state its beliefs about the value of Oi £ 0,;(d) in 
terms of parameters of the sample distributions of other panels. We need to try to avoid this dependence 
so that it is possible for the system to be distributive. We show that, happily, many causal systems can be 
parametrised so that variational independence does indeed hold. 

In this parametric setting, for each d £ D that might be adopted, each panel Gi, * £ [m], has two 
quantities available to them. The first is a set of sample summaries over the future measurements for which 
they have responsibility 


n f 4 [n f(o t ,A z ,d) ■. o, e 0 ^), a, £ A,,d £ d} . 

These might be the set of sample distributions associated with the predicted process 

{fiO^i I A.*, ®i) : £ ©i(d), Ai £ Aj, d £ O} , 

where 0, £ 0i(d) parametrises fi(Y, \ Ai,0i). For example, if Y were discrete and finite, then each panel 
might be asked to provide certain multi-way conditional probability tables over their subvector Y), conditional 
on each A t £ A i and d £ D. In this case Oi £ 0^(d) would be the concatenated probabilities within all these 
tables for that chosen d £ D. We have already noted that when a panel Gi is supported by its own 
probabilistic system, then a typically much longer vector of parameters <pi £ i>, (d) may be available to Gi 
on which the panel is prepared to communicate uncertainty judgements. In this case, typically 0, will be a 
low-dimensional function of (pi capturing only Gi s beliefs about features an IDSS needs. So, for example, a 
panel may have available a DSS designed to predict the health consequences of poisoning. But if an IDSS 
is designed to be used in an incident centre after a radiological accident, only the effect of poisoning from 
radiation, within the ranges of exposure of the accident and within ranges considered dangerous, will be 
needed for the decisions supported by the IDSS. Hence only the parameters of the margins of those features 
would need to appear in the 9 vector. 

Second we will assume that each panel is able to express, and explain if questioned, its beliefs 

II? = (Hf(Ai,d) : A, £ Ai,deB}, 

about the parameters Oi £ Qfd), d £ D, of its associated conditional distributions of Y t \ Ai(Y). Most 
generally, panel beliefs n® (d) might be expressed in terms of panel densities ni (Oi \ Ai, d). So in our example, 
this would be a joint probability distribution over all the probabilities specified in the conditional tables above. 
Note that a panel would not normally need to divulge how these judgements were made. For example, it 
would not need to show the details of any prior to posterior analyses unless the panel were interrogated, for 
example, during emergency conferencing at the time of a crisis or by a regulator assuring the quality of the 
system before a crisis occurred. In a parametric IDSS the vector of summaries Hf, mentioned above, can 
obviously be calculated by Gi through marginalisation. 

Note that the inference performed by panel Gi to provide their outputs is autonomous. What this means 
is that they should have available not only their outputs but also evidence about the statistical validity 
of the structure and distributions they might define. This statistical justification - demonstrated by, for 
example, various diagnostic plots demonstrating the plausibility of modelling assumptions made within their 
component in the light of hard data evidence - can be assumed to be available on request, i.e. an audit 
trail behind each panel’s probabilistic judgements is in place if the centre needs to query a panel’s outputs. 
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Such demonstrations will be henceforth assumed to be part of any supporting narrative, accessible through 
querying the component input. 

Panel Gi will of course use various data available to them to infer their distribution of If they do this, 
they will typically perform this inference autonomously: i.e. without reference to the other panellists. Now, 
it is by no means automatic that such autonomous updating will be justified if Gf s inferences are going to 
be inherited by the composite system. Examples |2.5| and |2.6| b elow give illustrations of when such autonomy 
is not formally justified. Later in the paper we determine sets of sufficient conditions when such delegation 
is formally possible. 

Definition 2.1. We call an IDSS distributed if the SB’s beliefs are functions of the autonomously calculated 
beliefs of the individual panels G 1 ,..., G m . 


2.2 Common knowledge assumptions for an IDSS 


Let us begin by assuming that after a series of decision conferences (French et al. 2009) held jointly across 


the panels, and electronic communications, stakeholders and users have all agreed the types of decisions the 
IDSS will support to a sufficient level of specificity and provided an agreed qualitative framework across all 
interested parties around which a quantitative framework can subsequently be built. To this purpose we 
assume three properties hold. 


Property 1 (Policy consensus). All agree the class of decision rules d £ D whose efficacy might be examined 
by the IDSS. 


This class of feasible policies considered will depend not only on what is logical, such as when various pieces 
of information are likely to become available, but also what might be acceptable and allowable, either legally 
or for other reasons. Again the choice of D will often be resolved using decision conferencing across panel 
representatives, users and stakeholders, as was the case during the Chernobyl project and the construction of 
RODOS discussed above. In the case of the county council policy analysis, the decision space D contains the 
different ways to legally implement central government cutbacks in the services provided to the needy and 
vulnerable in the county. Although we do not dwell on this point here, for an efficient and transparent system 
it is critical to customize this functionality carefully, so that the IDSS supports the real decision-making of 
the centre. 


Property 2 (Utility consensus). All agree on the class U of utility functions supported by the IDSS. 

In the complex multivariate settings we address here, a utility function U(y,d) needs to entertain certain 
types of preferential independence across its various attributes, where these attributes will need a priori to 
be agreed. In the case of RODOS these were usually measures of health consequences, public acceptability 
and cost of each possible countermeasure policy taken over space and time. In both our illustrative examples 


the family of utility functions is simply one of value independence ( 

Keeney and Raiffa 

1993 

) although this 

is certainly not a necessary condition for our methods to apply ( 

Leonelli and Smith 2015 

)■ 



Property 3 (Structural consensus). All agree the variables Y defining the process, where, for each d O, 
each U € U is a function of Y, together with a set of qualitative statements about the dependence between 
various functions ofY and 6. Call this set of assumptions the structural consensus set and denote this 
by S. 


This last consensus might be expressible through an agreement about the validity of a particular graphical 
or conditional independence structure across not only the distribution of (Y \ 9), but also the one of 6 
(Smith, 1996). This is then hard-wired into the IDSS. These types of assumptions are often complex, so we 
defer their discussion to later in the paper and examples of these, including those used in our illustrative 
applications, will be given in Sections [4] and [5] below. Other information that might be included in S could 
be a consensus about certain structural zeros or known logical constraints arising a shared understanding of 
the meaning of certain variables. 
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Definition 2.2. Call the set of common knowledge assumptions shared by all panels and which contains the 
union of the utility, policy and structural consensus (U,B,S) the CK class. 

Technically we can think of the CK class as the qualitative beliefs that are shared as common knowledge by 
all the panel members and potential users, who all know they know, and so on. The CK class will be the 
foundation on which all inference within the IDSS will take place. Note that this class will depend not only 
on the domain and needs of users of the system, but also on the constitution and knowledge bases of the 
panels. 


Definition 2.3. Call an IDSS adequate for a CK class (U,B,§) when the SB can unambiguously calculate 
her expected utility score U ( d ) for any decision d € D and any utility function U £ U from the panel marginal 
inputs Iff provided to her by Gi, i £ [to]. 

An adequate IDSS will be able to derive a unique score for each d £ D on the basis of the panels’ inputs. An 
IDSS clearly cannot be fully functional unless it has this property. Note that it should be immediate from 
the formulae of a given probabilistic composition to calculate these expectations whether or not the system 
is adequate. We illustrate such formulae later in the paper. 

To calculate {U(d) : d £ O} the SB will need, together with the CK class, enough probabilistic infor¬ 
mation to compute the expectations of the corresponding utilities. At worst this might need to be the full 
distribution of Y. More commonly, for typical choices of U, all that might be needed is the distribution of 
the margins on certain specific functions of Y or simply some summaries such as a selection of its moments, 
again indexed by d. 

To be defensible - in the sense that the explanations of the appropriateness of its delivered outputs 
provided by panels can also be legitimately adopted by the IDSS - a parametric IDSS needs another property. 


Definition 2.4. Call an IDSS sound for a CK class (U, D, S) if it is adequate and, by adopting the structural 
consensus, the SB would be able to admit coherently all the assessments H^ S and Ilf (and hence Iff) as her 
own, the SB’s underlying beliefs about a domain overseen by a panel Gi being {Ilf ,Ilf}, i £ [to]. 


In Theorem |3.1| we give a set of necessary conditions that in general guarantee the soundness of an IDSS. 
We note that in a surprising array of different circumstances an IDSS can be designed so that it is sound. 

A sound IDSS does not necessarily need to embody the full beliefs held by all panel members and based 
on the totality of their own individual evidence. This would often be inappropriate for a shared belief system, 
whose outputs will need to be defensible. For example, the evidence used to form the subjective judgements 
of individual panel members, although compelling to them, may derive from poorly designed experiments 
or simply be anecdotal. Because such information could not be robustly defended, it might not be possible 
for the centre to adopt it. So for example, an as yet unpublished observational study on those exposed to 
radiation after Chernobyl might strongly indicate that the effects of increase of cancers, commonly predicted, 
have been grossly exaggerated. Although the relevant panel might find this strongly compelling, it might 
not be appropriate to input this information into a common knowledge system because the study has yet to 
be adopted generally by the scientific community. 

The sound IDSS does, however, present a defensible and conservative position all panellist should be 
happy to communicate and provide a benchmark for further discussion. In this sense the beliefs expressed 


in an IDSS are analogous to a pignistic belief system (Smets 2005): the best legitimate belief statements 


that can be made if the centre is called to act under uncertainty in a coherent and justifiable way. 

To illustrate how these properties might apply even in a trivial setting we consider the following simple 
example. 


Example 2.1. Consider the simplest possible scenario where m = 2 and the CK class specifies that Y = 
(Y L ,Y' 2 ) T ) where both Yi and Y 2 are binary. Here the random variable Y\ is an indicator of whether or not 
a food stuff has become poisonous and Y 2 is an indicator of whether or not sufficient quality controls are in 
place to ensure that any contamination is detected before the food is distributed to the public. A family of 
sample distributions TL\ S given by panel G%, expert in the processes that might lead to poisoning, is saturated 
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so that 9\ = P{Y\ = 1). Panel G 2 , consisting of experts with a good knowledge of quality control systems, 
has beliefs about the probabilities 

o 20 = p(y 2 = 11 Yi = o), e 2l 4 p(y 2 = i\y 1 = 1). 

Write 0 2 = (02o,02i) T - If within the CK class U is an arbitrary utility on Y, then for adequacy the 
SB will need to be able to calculate her expected joint probability table of Y, i.e. the expectations p = 
(0oo,0oi,0io,0n) of 9 = (0oo,0oi,0ioAi), where by definition 

000 = (1 — 0l)(l ~ 02o), 001 = (1 - 0l)02O, 010 — 01 (1 — 02l), 011=01021- 

Assume that prior panel independence is within the CK class: i.e. there is a consensus between the members 
of the two panels that 0 2 is independent of 6\. Then writing 

0i(0) = E (0! | d), p 2 (0) = (020(0), P2i(0)) 4 (E(0 2O I 0),E(0 1O | d )), 

we would have that 


000 (d) = (1 - 0! (d)) (1 - 020 (0)), 0oi(0) = (1 - 0i(0))02o(0), 
010(0) = 0l(0)(l - 021 (0)), 011 (0) = 01 (0)021 (0). 

Suppose panels G\ and G 2 are able to calculate, respectively 

II® = {0i(0) : 0 € D} , and H 2 = {02(0) : 0 £ D} . 


Then the IDSS is adequate. Because of the properties the expectations, in this case the belief summaries 
Gi need to deliver are simply Ilf = Ilf, i £ [2]. This provides the SB with all the information she needs to 
calculate all her expected utility functions using the formulae in (2.1). The IDSS is also sound since these 
inputs are consistent with the probabilistic beliefs of anyone with any probability model over {Y , 0i,0 2 ) who 
believed the agreed prior panel independence assumptions and held the expectations given above. 


Note that, for any d £ D, it is not a trivial condition that the SB can make the calculations she needs in 
terms of Ilf, i £ [2]. For example, if instead of providing its beliefs about the conditional probabilities, panel 
G 2 provided its beliefs about the margin of Y 2 , the marginal joint distribution of Y = (Y\,Y 2 ) would not 
then be fully recoverable since we have nothing from which to derive, for example, the covariance between 
0i and 02 which is needed to calculate the covariance between Y\ and Y 2 . So, if structuring of the process is 
not performed beforehand, then post-hoc combinations of outputs from panels’ models may not be formally 
possible. 


2.3 An illustration of some of the inferential challenges 


It is convenient at this stage to use another very simple example to illustrate which statistics need to be 
communicated by panels to an IDSS. 


Example 2.2. Assume a CK class gives (Y\,Y 2 ) the same meaning and sample space as in Example 2.1 
However add to the CK class the additional structural assumption that Y 2 i Y\ | (0i,0 2 ) whatever decision 
0 £ D is made. Thus, once the probabilities of these events were known, it is generally accepted that 
learning that contamination had been introduced would not affect our judgements about the efficacy of the 
quality control regime. Suppose Gi delivers the set of beta distributions Be(a,(0),/3j(0)) for 9i = P(F) = 1), 
0 £ O, i £ [2]. Note that because of the structural assumption above, in the notation used in Example \2.1\ 
02 = 020 = 02i- Consider two possible CK classes: where a decision centre is known to draw its utilities 
Ui £ Uj, i £ [2], from one of the families below 


Ui(yi,y 2 ,d) = ai(0) + bu(d)m 
U 2 (yi,y 2 ,d) = a 2 (0) + b 2 (d)w, 


012 ( 0 ) 02 , 


if bJ\ £ HJr, 
ifU 2 £U 2 , 
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and W = Y 1 Y 2 is an indicator of whether the public is exposed to the contamination, with ai(d), 02 (d) £ R. 
and 611 (d), 612 (d), 62 (d) £ R>o for all d £ D. If TtJi is in the CK class then the SB needs only that Gi 
supplies its mean pi(d) = cq(d)(cq(d) + (d )) —1 of 9i, i £ [2], as a function of the decision d £ D taken: a 

simple one-dimensional summary. However if, instead, U 2 is in the CK class then the SB needs to be able 
to calculate 

e (w | d) =E(e 1 e 2 | d), 

for each d £ D. It is easily checked that the above panel summaries would no longer necessarily be adequate 
if U 2 was in the CK class unless further assumptions were added. Explicitly, the SB would also need to add 
to the CK class a global independence assumption 61 JL 0 2 . If this was done then the distribution of 9 = 9\92 
would be recoverable from the panels ’ expectations and thus 

E(IE | d) = E(0 | d) =^i(d)/x 2 (d), 


would be well defined. So the IDSS would be adequate. 

To be feasible and of enduring usefulness, it is usually necessary to require that the IDSS is distributive 
so that panels can autonomously update their probabilistic beliefs about their area of responsibility as they 
receive new information. 


Example 2.3. Assume a random vector (X\,X 2 ) is sampled from the same population as (Y),!^) in the 
model of Example \2.2\ and that, for each d £ D, 9i X is in the CK class. Each panel Gi next refines 
its probabilistic assessments by observing its own separate randomly sampled populations, Xi, concerning 9i 
alone, and then updates its parameter densities, given each d £ D, from ni(9i \ d) to 7 q(d,; | Xi , d), i £ [2]. In 
this case, the two panels need to deliver only their posterior means /Lt*(d, aq), i £ [2], d £ D. The SB can then 
act coherently. By adopting all these beliefs as her own, she will act as if she had sight of all the available 
information and had processed this information herself. The IDSS is therefore sound and distributed. 


Note, however, that in the example above the global independence assumption is critical for this dis- 
tributivity property to hold. 


Example 2.4. Suppose that 9 2 is not independent of 9\ so that tt(9 2 \ 9\) needs to be a function of 9 1 for 
at least some d £ B' C B. Then, in the notation of Example \ 2.for these d £ B', 


tt 2 (9 2 I x 1 ,x 2 ,d) 



9 1 ,X 2 ,d)TTl(9 1 


x 1 ,d)d9 2 . 


where the prior dependence of 9 2 on 9\ induces a dependence of 9 2 on aq. So, in particular, /z?j(d, aq, aq) 7^ 
H 2 (d, aq) in general. Therefore, by devolving inference to the two panels who learn autonomously, the SB 
will not be acting as a single Bayesian would by using /r*(d, aq), i £ [2]. It follows that the system is no 
longer sound, although when supporting evidence remains unseen the SB will appear to act coherently. The 
explanation of her inferences can no longer be devolved to a single panel and so difficult to defend. She will, 
implicitly, be assuming that 9\ 1 (9 2 , which is contrary to the reasoning G 2 would want to provide. 


Perhaps of even more importance, is to note that even if global independence is justified a priori , the 
assumption that data collected by the two panels and individually used to adjust their beliefs does not inform 
both parameters is also a critical one. There are two important special cases we examine below which give 
a flavour of this difficulty and illustrate why it is important to construct panels not only on the basis of 
domains of expertise but also whose composition matches, as far as possible, domains over which supporting 
vectors of measurements exist. 


Example 2.5. Continuing from the last example, suppose that G\ and G 2 both see their respective margin 
concerning the experiment in Table [1J where 100 units from a population are randomly sampled, and each 
uses this experiment to update its respective marginal distribution on 9i for a particular value of d £ B, 
* £ [2]. Then, if both began with a prior symmetric about 0.5, each would believe that 
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Y!\Y 2 0 1 


0 

5 

45 

50 

1 

45 

5 

50 


50 

50 

100 


n — x 2 

Xl 



Table 2: Experiment of Example 2.5 


V*i(d,x i) = p* 2 (d,x 2 ) = 0.5. 

So were 8\ X d 2 in the CK class, the utility function U GV 2 and data the individual panels used was naively 
restricted to the relevant margin, the IDSS would assign 


fi(d,x i,x 2 ) = n\(d,xi)ii* 2 {d,x 2 ) = 0.25. 


Note that this inference contrasts with inferences the SB would make on seeing the whole table and assuming 
9 1 X d 2 a priori. With a fairly uninformative prior on the two margins, her posterior mean of 0i0 2 would be 
approximately 0.05, i.e. five times smaller than the expectation calculated above. 

So, when incorporating joint data of this type, it is not easy to preserve soundness. Experiments mea¬ 
suring a function of the variables with error - even when these functions relate directly to the predictions at 
hand - can induce similar difficulties. 

Example 2.6. Continuing the setting in the example above, suppose it is only possible to see the table 
of randomly sampled counts associated with W = Y{Y 2 , i.e. the number of foodstuffs that have poisoned 
someone. Suppose the 100 individuals in Tabled could be thought of as having been drawn from a Binomial 
experiment with x w = 5 values of W = 1 within the sample. Suppose the SB uses this information directly: 
for example by introducing a uniform prior on 9 W = P(W = 1). This would lead the SB to have a posterior 
mean of p,(d,x w ) ~ f(d,xi,x 2 ) = 0.05. However, observations have induced a dependence across 9i and 0 2 : 
the global independence assumption is no longer formally valid and, if we plan to demand that the IDSS is 
sound, the future distributivity of the system will be destroyed if this data is accommodated, and so frustrate 
future calculations. 

So, we have illustrated that, even in the simplest of networks, considerable care needs to be exercised 
before an IDSS can be expected to work reliably. Because of the simplicity of the examples above only means 
needed to be delivered. However once we move away from the case of two binary variables this is, generally, 
not the case. Nevertheless, we will see later that, as we increase the complexity of our problems, often each 
panel will need only to provide certain additional lower-order moments. In the next section we will prove 
some conditions which ensure our IDSS, however large, will be sound. We will also discuss protocols for 
admitting data, that might be adopted by panels, designed to avoid the sorts of issues illustrated in the last 
two examples. 


3 Coherence and the IDSS 

3.1 Conditions to ensure a sound and distributive IDSS 

3.1.1 Information and Admissibility Protocols 

Suppose the IDSS is dynamically presented with a large amount of new information as time progresses. In 
practice, within the totality of information conceivably available to panellists at time t, J usually only a 
subset - the admissible evidence - will be of sufficient quality and have suitable form to be integrated into 
an IDSS. The sorts of information excluded or delayed admittance might include evidence whose relevance 
is ambiguous or of a type which might introduce insurmountable computational challenges to an IDSS. An 
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admissibility protocol is therefore needed to define the admissible evidence so that inferences made using the 
IDSS can be defended and feasibly and formally analysed within a required time frame. 

Let Iq denote all the admissible evidence which is common knowledge to all panel members at time t. 
Let /*• denote the subset of this admissible evidence panel Gi would use at time t if acting autonomously 

Define the admissible evidence as = {/*• : i,j £ [m]} and let 


to assess their beliefs about Oj, i,j £ [m] 


/* = {Jb : i £ [to]} be the subset of the admissible evidence each panel G; would use to update 0 , , i £ [m]. 

Of course, that there exist relevant protocols for the selection of good quality evidence for decision support 
is often assumed even within single agent systems, however, its explicit statement is frequently omitted. A 
notable exception is admissibility protocols for evidence concerning medical treatment where the Cochrane 
reviews are considered to be the gold standard in decision support ( [Higgins an d Green, 2008). Their purpose 
is to pare away information which might be ambiguous and so potentially distort inference, through a trusted 
set of principles relevant to the domain. This may seem restrictive, but in practical applications the need to 
be selective about experiments that can provide evidence of an acceptable quality before formally committing 
to policy - so that their adoption can be robustly defended - is universally acknowledged. The corresponding 
beliefs expressed within any support tool are therefore often, by their nature, conservative. Formal adoption 
in P of evidence whose interpretation might be contentious should be avoided whenever possible. Such 
information is best used as a supplement to the sound inference rather than being integrated into it. Note, 
however, that information in can also be used formally by users and panellists to provide diagnostic 

checks of the inference using 7} alone. 

The demands for such an admissibility protocol of an IDSS are even more important than for standard 
single agent decision support, because of its collective structure. So here we assume that panels, both 
individually and corporately, will agree an appropriate protocol for selecting suitable experimental evidence 
in line with good practices, mirroring Cochrane reviews in ways relevant to their domain. However, one 
additional requirement is needed in this setting: the chosen admissibility protocol must also ensure that an 
IDSS remains distributed over time, for we have already argued that, if this is not the case, then the output 
of the IDSS is either dependent on arbitrary assumptions and difficult to calculate or, if distributivity is 
forced, will become incoherent. We now explore the properties of a candidate set of admissible evidence l\ 
that lead to an IDSS being both defensible and coherent. 


3.1.2 Conditional independence in CK parametric models 


We begin by assuming that the collective, as represented by the SB, is happy for its inferences to obey 


the (qualitative) semi-graphoicl axioms given in the appendix of this paper (Pearl 1988 Smith 2010). 


These general properties are widely accepted as appropriate for reasoning about evidence when irrelevance 
statements are read as conditional independence statements: in particular Bayesian systems always respect 
these properties (|Dawid 2001 Studeny 2006). Irrelevance statements can also be expressed in common 


language and so are more likely to form part of the common knowledge shared by a set of panellists (Smith 


1996). We later investigate how these ideas translate when all panellists are fully Bayesian. 

In this setting, it is useful to recall that one useful definition of a parameter 0 £ 0 in any parametric 
model can be phrased in terms of irrelevance. Explicitly, it might be common knowledge that everything 
relevant to the future random vector Y that might be chosen from the totality J\ of past information 
available at time t is embodied in what we have learned about a parameter 6 £ 0 and Jl.. This would then 
imply the irrelevance statement that for all d £ D 


Y 1 J% 


0 


I+i d. 


in particular contains the structural assumptions of the model, S, and the sample distributions 


Here I\ 

delivered by each panel. So, the SB’s parameter vector is a concatenation of subvectors 9 = (0;:)ie[ m ]> 
Gi parametrises their delivered sample distributions by Oi , i £ [to]. 


where 


Now suppose it is common knowledge that the expected utilities U(d 


IX), d £ I 

P + and used by the SB to score her possible options can all be written in the form 


P, posterior to observing 


u(d I J|) = E y 0|J i (t{Y, 0 , d )), 
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for some function r(Y\ 0, d). Further suppose it is common knowledge that r(Y, 0, d) has the form 


r(Y,0,d)= J2 II T i, B (Y, d)p i!B (6i,Y, d) 

BC[m] i£B 

where, for i £ [m], the set {ri t B{Y,d) : B C [m],i £ B,d £ O, Y £ Y(d)} is known by Gp. a condition that 
encompasses many common classes of model (Leonelli and Smith, 2015| ). We will also see later that assum¬ 
ing most common structural frameworks as common knowledge, then under utility, policy and structural 
consensus the functions Pi t B(Qi,Y,d) are simple functions of 11^ , i £ [m]. 

Definition 3.1. Say that a CK class of an IDSS exhibits panel independence to at time t iff the 

SB believes that under any policy d £ D 

-E i£[m]@i | -Cj-, d. 

Then if the IDSS exhibits panel independence, the expected utilities U(d) can be calculated using the formula 


U(d I 4) = £ [l[T i}B (Y,d)p i:B (Y,d) 


BC[ml 


\i£B 


where p i B (Y, d) = E e . | p (p. B (0 ; , Y, d)) . Now the key point here is that in this case the set of expectations 


Pi = {p. iB (Y,d) :B C [m],*£fl,d£B,Y£ Y(d)} , 

can be calculated locally by panel Gi , i £ [to]. So all panels, and therefore the SB, can agree that a sufficient 
condition for a system to remain distributed and adequate over all time is panel independence. So we next 
investigate conditions that will ensure that panel independence holds. 


3.1.3 Panel Independence and Common Knowledge 

We now define four properties to add to the CK class to ensure the soundness of an IDSS under a given 
admissibility protocol. Let 0,- = (^i)_,g[ m ]\{i}■ 

Definition 3.2. Say that a CK class of an IDSS is delegable at time t if for any possible choice of policy 
d £ D and for i £ [to] there is a consensus that for all 0 £ 0(d) 



4 x 0 1 

d ()) A ; d. 

(3.1) 

separately informed at time t if 

4 JL 0j- 

1 

(3.2) 

cutting at time t if 

it-LOi 1 4,4,0j-, d 

(3.3) 

commonly separated at time t if 


i 1 4 1 d 

(3.4) 


An IDSS is delegable at time t when it is in the CK class that, for any choice of future policy d £ D, 
the totality of admissible evidence If fed into the IDSS is the union of the evidence Iq shared by all panels 
plus the aggregate of individual admissible evidence J* each panel has about its own particular domain of 
expertise at time t. Note that if the panel members are working collaboratively rather than competitively 
then this condition might be ensured through adopting a protocol where if one panel has new evidence 
which they think might inform another then they will immediately pass this on to that panel for appropriate 
accommodation: see below. Alternatively the protocol could itself simply demand that I l + = {/q, J*}. 

The next two assumptions then allow us to perform inference in the distributed way we will develop later. 
When a system is separately informed, pieces of evidence Gi might collect individually will not be informative 


16 







about the other parameters in the IDSS owned by other panellists once the domain experts’ evidence has 
been fed in. When a system is cutting, once {Zq, } is known, no panel believes that another panel Gj has 
used any information that G; might also want to use to adjust its beliefs about Oi. This captures what we 
might mean when we call a panel ‘expert’ over a particular domain. So, for example, to accommodate a piece 
of evidence, Gi might first have needed to marginalise out a parameter in 8 j because its sample distribution 
depended on this component. In this case the IDSS would not be cutting: this experiment told the SB not 
only about Oi but also Oj. Formally, the assessment of these two parameters could then become dependent 
on each other a posteriori , as illustrated in the last section. In practice the protocol would demand, to 
satisfy the separately informed condition, that a new piece of information would only be added to Jb by Gi 
if the strength of evidence it provided about 8 , would not depend on 0 ( -. We see later that for a variety of 
overarching structure much information available to Gi would satisfy this demand. Typically this condition 


is broken with the loss of ancestral sampling in a BN (see, for example Smith 2010). 

When parameters are commonly separated all the information that everyone shares separates the param¬ 
eters in the system. Suppose all panels were constituted by the same people, the overarching system was 
a BN and the panels consisted of those deciding on the parameters Oi of the density of each vertex Y), i.e. 
those defining the distribution of each variable conditional on its parents. Then this would reduce to the 


condition that global independence held at time t (Cowell et al. 1999). From its proof, panel independence 


can actually be seen to be a consequence of the other properties we prove in Theorem |3.1| below. 

Now assume that the four conditions in Definition |3.2| all lie in S. The following result, analogous to 


Goldstein and O’Hagan (1996) which concerned the use of linear Bayes methods and a single agent, can now 


be proved. 

Theorem 3.1. Suppose an IDSS for a CK class (U,B,S) is adequate where U and D are arbitrary and S 
includes the consensus that the IDSS is delegable, separately informed, cutting and commonly separated at 
time t. Then it will also be sound and distributed at time t. Furthermore it is common knowledge that the 
SB’s beliefs about each panel’s parameter vector Oi are the same as those of the corresponding expert panel 
Gi, i £ [to], for all d £ D and at any time t > 0. 


See Appendix |7.1| for a proof of this result. Certainly the conditions required in Theorem 3.1 above 


are in no sense automatic. We will show that, nevertheless, they are satisfied by a very diverse collection of 
models and information sets. So for example suppose that within our food security example G\ was panel 
with expertise in food production and using a piece of probabilistic software for a model parametrized by 0 1 . 
Suppose on the other hand that G 2 was a panel with expertise in the effect of nutrition on a child’s educational 
attainment. Suppose that this panel based its judgements on a regression model of such attainment on various 
food production indicator covariates parametrized by O 2 . Then delegability would be the assumption that 
the composition of domain information available to these two individuals covered the two areas - i.e. that 
both panels could be thought of as expert. The separately and cutting informed hypotheses asserts that 
neither expert has information available to them that they will use in their own assessments of their own 
parameters that could also usefully be used by other panelists. The commonly separated hypothesis assumes 
the priors based on commonly available information could be set independently of each other or other panels 
in the system. These are substantive assertions of course but both G\ and G 2 should be able to reflect 
on whether such assumptions might be compelling. When considering the influence of their beliefs on each 
other at least it is likely that G\ and Gi will be happy to accept the premises of this theorem and so its 
conclusions, unless an experiment becomes available that might confound these parameter vectors - see later. 

Note that this theorem holds irrespectively of the form of the utility function in the CK class: weaker 


conditions might guarantee adequacy for specific classes of utility factorizations (Leonelli et al. 2015a). 


Moreover, it applies whatever the definition of the underlying semi-graphoid, not only to probabilistic sys¬ 
tems. 


3.1.4 Likelihood separation in distributed probabilistic systems 

We now focus our attention onto probabilistic systems and examine what soundness and distributivity might 
mean in this most common of contexts. Suppose an IDSS exhibits panel independence to at time t = 0 so 
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that the SB believes that X ,; e | (/+, d), d £ D. Assume also that the only additional evidence presented 
to the IDSS by time t by any panellist will be in the form of data sets x l = {x T : r < t} which then populate 
P + . The features that ensures the IDSS remains sound and distributed over time can be expressed in terms 
of the separability of a likelihood. Let 1(6 \ x 4 ), t > 0, denote a likelihood over the parameter 6 of the 
distribution of Y given x 4 . Recall that subvectors of parameters associated with the probabilistic features 
delivered by the panel Gi are denoted by i £ [to]. 

Definition 3.3. Call 1(6 \ x 4 ) panel separable over the panel subvectors &i, i £ [to], when, given admissible 
evidence x 4 , it is in the CK class that for all d £ B 

1(6 | x 4 ) = k(6i | tj(x 4 )) 

ig[m] 


where U(6i | t.j(x 4 )) is a function of 6 only through 6i and i,(x 4 ) is a statistic of x 4 known to Gi and perhaps 
others, collected under the admissibility protocol and accommodated formally by Gi into I\ to form its own 
posterior assessment of 6 it i £ [to]. 

We now have the following theorem that gives good practical guidance about when and how soundness 
and delegatability can be preserved over time. 

Theorem 3.2. Suppose an IDSS is adequate, delegable, separately informed, cutting and commonly separated 
at time t = 0 and, for all times t > 0, data admitted to the system is panel separable at time t. Then, provided 
the joint prior over 6 is absolutely continuous with respect to Lebesgue measure, the system is sound and 
distributed at time t. On the other hand if at any time t the system is not panel separable over a set of 
non-zero prior measures over the parameter vector 6 then the IDSS will no longer be sound or distributed. 

See Appendix |7.2| for a proof of this result. So, for example, by designing a single experiment to be 
orthogonal over parameters 8i and 6j, where are parameters in Gf s model and 6j in Gy’s, ensures, under 
the conditions of the theorem, that an IDSS is sound and distributed, i,j £ [to]. From this single experiment 
data can still be included in the admissible evidence for both panels Gi and Gj and the system will still 
remain distributive. Note, however, that the converse demonstrates that some protocol is certainly needed 
to preserve the distributivity property, even approximately. 

Henceforth in this paper, until the discussion, we will assume all data admitted into an IDSS ensures this 
separability property. 


3.2 Causality and the IDSS 

3.2.1 Causal hypotheses: control and experiments 


Recent advances have been made in formalising causal hypotheses in order to make inferences about the 


extent of a cause. Most of the original work in this area centred on BNs (Pearl 2000 Spirtes et al. 1993). 


However, the semantics have since been extended into other frameworks (see e.g. Dawid 2002] 

Dawid and 

Didclez 

2010 

Eichler and Didelez||2007||Lauritzen and Richardson, 2002| Queen and Albersj 2009 

Smith and 


Figueroa 2007). Typically, these assume that there is a partial order associated to the system (Riccomagno 


and Smith 2004) and that, within a causal framework and its implied partial order, the joint distributions 


of variables not upstream of a variable, which is externally controlled to take a particular value, remain 
unaffected by the control, whilst the effect on upstream variables is the same as if the controlled variable 
had simply taken that value. These causal semantics are a special case of the property described below in a 
sense we explain later. If this property is contained in the CK class then this greatly simplifies the learning 
each panel needs to undertake. 


Definition 3.4. Call an IDSS (B, d°) -determined if nf = {n®(Ai,<i) : d £ D,A £ is a stochastic func¬ 
tion ofHi(Ai,d°), known to Gi, for some prescribed decision d° £ B and a specific Ai £ Ai, this being true 
for the beliefs of all panels Gi, i £ [to]. 
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Once each panel has specified its beliefs about its own parameter vector 6b under a particular decision 
then, clearly, if the above property holds, the panel beliefs under other decisions can be calculated automat¬ 
ically. Obviously whether and how this condition is satisfied depends heavily on the domain of the IDSS. 


However, surprisingly it is often met, sometimes implicitly as in the Kalman Filter (Brockwell and Davis 


2002). Perhaps more critically, it is also implicit for causal systems in the following sense. 


Lemma 3.1. A causal BN, whose vertex set is {Yi : i £ [m]}, in a given CK class is a 
IDSS whenever the following three conditions hold: 


I>, d°) -determined 


1. 6 consists of the entries in the conditional probability tables of the BN; 

2 . d° is the decision not to intervene but simply observe and D consists of decisions d(yk) of setting a 
component Yk to one of its particular levels yk! 

3. a single panel is responsible for the whole of a particular conditional probability table of each Yi, i £ [m], 
given a particular configuration of its parents. 


Proof. This derives directly from the definition in Shpitser and Pearl (20081 of a causal BN. Here we have 
three effects: 

1. the impact of setting a variable Yk to a value yk will first have the effect of making the probability of 
the event Yk = yk equal to one, conditional on any configuration of its parents. 

2. all conditional probability tables in the BN having Yk as a conditioning variable are replaced by the 
ones obtained by conditioning on the event Yk = yk- 

3. the conditional probability tables of all other components of Y are left unchanged. 

This means, in particular that, as demanded by our definition, Hf(Ai,d(yk)) can be calculated as a simple 
function of n®(A,d°). So this causal BN is (O, d°)-determined. □ 

It also can be easily checked that, for example, causal hypotheses for other frameworks, such as chain 


event graphs (Smith and Anderson 2008) and multiregression dynamic models (MDMs) (Queen and Smith 
1993) provide a (D, d°)-determined IDSS where (O, d°) are analogously defined. Furthermore, at least in 


an adapted form, these causal hypotheses often relate to ones the SB would want to make within an IDSS. 
For example, in the scenario of exposure of cattle to disease, various culling regimes may be proposed to 
limit the exposure of susceptible cattle and help ameliorate the spread of the disease. Suppose expert panel 
judgement has been based on what was observed to happen in a parallel epidemic across farms when no 
controls were in place. If there were common agreement that exposure, appropriately measured, really did 
‘cause’ future infection, then a substantive, but plausible, hypothesis would be that the effects on the spread 
of the infection, if exposure were to be controlled by culling, could be identified with the effects when the 
effects of this culling regime had occurred naturally. Probability predictions associated with this sort of 
control could then be inherited from the results of the observational study on the speed of disease spread 
when no control is exerted. The efficacy of various possible culling regimes could then be scored even if 
these regimes had never previously been enacted. Indeed this type of extrapolation is so common that 
the hypotheses on which it is based sometimes go unnoticed. Note that one particular consequence of this 
assumption is that different culling regimes giving rise to the same exposure profile, all other factors being 
equal, would be equally scored. 

Within our context, it is helpful to extend the usual causal assumptions for three reasons. First, we may 
not need to require that the conditions hold for all possible levels of components of Y, but only a subset of 
these levels, broadly those that might improve, in some sense, the outcomes within the unfolding crisis. In 
practice only a very small proportion of the possible types of intervention considered in a casual model are 
usually entertained. It is, therefore, useful to acknowledge this less stringent assumption. Second, we might 
want the flexibility not to map the effects of enacting a control, but a rather more complex decision. This 
then embellishes D. Last, the natural comparator d° for predicting what might happen may not be doing 
nothing but rather following routine procedures or past protocols. For all these reasons we have found very 
useful to generalise these definitions as above. 
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3.2.2 Causal hypotheses and likelihood separation 


A second type of causal assumption is a useful addition to the CK class because it enables panels to accom¬ 
modate experimental as well as observational data. Our motivation stems from work by |Cooper and Yoo] 
(1999). They developed collections of additional assumptions which enabled formal learning of the parame¬ 


ters of discrete BNs when data was not only observational but could also come from designed experiments. 
They noted that if the BN was causal, in the sense given above where D contained the controls imposed on 
explanatory variables commonly used in setting up experiments, then experimental data could be introduced 


in a simple way. This technology has recently since been transferred to other domains (e.g. Freeman and 


Smith 2011 Cowell and Smith 20141. 


Different panels in most IDSSs will want to accommodate not only observational and survey data but 
also experimental data. Here covariates are controlled and set to certain values and the subsequent effect on 
a response variable observed. So, to be able to accommodate relevant scientific evidence, most operational 
IDSSs will need to assume causal hypotheses concerning controlled experiments overseen by a relevant panel 


as this applies to an unfolding crisis. Interestingly, it has been shown (see Daneshkhah and Smith 20041 at 
least within the context of BNs that the panel independence assumption necessary to ensure distributivity 
of an IDSS is intimately linked with, and plausible only when, certain causal hypotheses can be entertained. 
These causal hypotheses concern not only experiments within a module used by a panel but also across the 
interface of the modules. Since the first class of hypotheses is typically a function of a single decision support 
system in this paper we will concentrate mainly on the second case. 

Definition 3.5. Call an IDSS e 4 — panel compatible for a collection of experiments e 4 — (e 4 ) ig [ p j leading to 
a data set a: 4 if the likelihood associated with e 4 can be written as 


l e (e 1 t (a*)) = n W I «*(**)) 


i€.[m] 


where lf(0, | tj(jc 4 )) is a function only of the parameters 0, overseen by a single panel Gt, i £ [to]. 

By far the most common such collection of experiments is one composed of collections of independent 
experiments that can be partitioned so that each experiment is informative only about the parameters 
overseen by each particular panel. Again an important special case when this separation will be automatic is 
when the overarching qualitative structure is a BN and this BN is causal under experimental manipulations. 

Lemma 3.2. Suppose an IDSS is (O, d°) -determined where d° £ D consists of no intervention on the system. 
Then the IDSS is e-panel compatible whenever e 4 is made up of components e\. consisting of independent, 
double blind, randomised, designed experiments where the response random vector Y( denotes the observed 
number of units in a causal BN when its parents take their particular possible vector of configuration of 
values, i £ [to], k £ [p]. 

Proof. One of the properties of a causal BN is that the probability of a random sample of a set of observations 
of one of its nodes when its parents are manipulated to take a certain value can be equated with the probability 
of observing this sample when it is idle - i.e. when there is no intervention. In particular this means that 
the 6 i appearing in the manipulated experiment is equal to Gf s corresponding parameters of interest in the 
observational setting where d° £ D, i £ [to]. But because this is so for the idle control d°, since the IDSS is 
(B, d°)-determined, it is also true for all d £ D. □ 

Note that as in the remarks of the last section, these are not the only class of experiments to have this 
property but simply an important special case. The directly analogous definitions of causal hypotheses as 
they apply to different overarching graphical structures like chain event graphs or MDMs also admit such a 
factorisation. 
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4 Examples of sound and distributive frameworks 

We saw in Section [3] that, provided a system is such that certain qualitative properties existed over the 
parameters of a composite model and the likelihood of information separated over parameters in an appro¬ 
priate way, then the composite system should remain distributed with all the implementation, interpretative 
and computational advantages that confers. But how common are such systems? The answer is that whilst 
many systems violate the conditions needed, many others satisfy them. So by choosing panels appropriately 
and by demanding that only certain types of unambiguously interpretable data are entered into the IDSS, 
it is often possible to build such distributed systems. In this section we present some well known settings 
where this is possible and one where it is not. 

4.1 Stochastic and value independence 

We begin with a trivial system made up of independent components and linear utility. 

Definition 4.1. A centre’s reward vector Y = (li)ig[ m ] is said to have value independent attributes, 
and the utility has a linear form , ifU{d,y) can be written as 

U{d,y) = £ h{d)Ui{d,yi), (4.1) 

i£[m] 


where Eie[m] ki(d) = 1) ki{d) £ (0,1) are called criterion weights and sup d :y . Ui{yi,d) = 1 , i £ [m\, d £ D. 

Suppose the CK-class contains the hypothesis that all potential users will have value independent at¬ 
tributes. Assume further that the vectors Y, are mutually independent for any given d £ D so that 
X ie [ m ]Y) | 9i , d, where the distribution of V) is a known deterministic function of 0; and d £ D, i £ [m\. For 
each deD, let the expected utility f/j(0i , d) denote the expectation of Ui(yi , d) over Yi \ 9i,d, and U (d) the 
expectation over Y t \ d, i 6 [m\. Then 


U(d) = £ ki(d)Ui(d), 

i€[m] 


where 


Ui(d) = f U i {Oi,d)'K i {0 i 

Jdie&iid.) 


d)d0i, 


and 7 Ti(0i | d) is SB’s /Gfs prior on 0; | d, d £ D. This system is clearly a priori distributive. So the SB can 
devolve her calculations of the expected utility of each d £ D to the relevant panels. She can then use these 
expected utility evaluations with a user’s input k(d) = (ki(d))i ^\ m ] t o evaluate U{d) and hence discover an 
optimal policy (as in Dodd and Smith 2012 Smith and Dodd 20121. 


Note that in this scenario no panel needs to contemplate any issue other than those concerning its own 
domain of expertise. The IDSS can simply re-evaluate scores if changing circumstances demand that the 
user’s priorities, reflected in her choice of criterion weights, need to be modified, using the original delivered 
information. However, as soon as the utility is not linear, or there is a dependence induced across the 


components Y), i £ [to], then this need no longer to be the case (Leonelli and Smith, 2015). 


4.2 Staged trees 

Perhaps the simplest overarching structure within which to express forms of dependence is the event tree 
T{d), for each dsD. Suppose that panels can agree the topology of the underlying event tree T(d) for each 
possible decision describing the set of corresponding unfolding events. Suppose further that all agree that 
panel Gj, i £ [to], should deliver the edge probability vectors 0, ; - associated with edges emanating from each 
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non-leaf vertex Vij, j G [mj. Let Q i3 = (0ijfc) fce [ m p = ( 0 ij)j e [ mi ] and 0 — (Oi)ie[ m \- Then, for a random 


sample or. 


where 


l{6 | x l ) = li(di | ti(x*)), 

i£[m] 

k{6 l | ii(x*)) = | tj(x 4 )), 

ie[m;] 


and 


kjidi \t i {x t ))= n 
fee[mij] 


where Y^ke[m- ] ^u'fc = 1 and x ikj is the number of units in the sample reaching vertex v.- l3 and then proceeding 
down the k th edge, i G [m], j G [m*] and fc G [m^]. Here U = {xijk ■ j G [raj],fc G [mp]}- Clearly, in many 
cases, including complete datasets, this likelihood separates over the panels. It can also be shown that if the 
event tree is causal, as defined in Cowell and Smith (2014), Shafer ( |1996 ) and Thwaites et al. (2010), then 
the likelihood remains separable. 


An example of this sort of structure comes from level activity modelling in forensic science (Cowell et al. 


2011 Dawid et al. 2007). When collecting evidence concerning a particular criminal prosecution, inference 


can sometimes be expressed in terms of such a tree where the different panels will be the jury members, 
the forensic experts, and the court recorders: see Smith (2010) for an example in this context. Then under 


panel independence the whole system will enjoy the property of being distributed. Note that in this sort 
of application, when the prosecution and defence case have a different narrative about an activity, e.g. the 
defence case asserts the suspect was not at the scene of the crime whilst the prosecution case asserts that 
the suspect went through a sequence of actions, then the corresponding event tree is very asymmetric. In 
these sorts of examples, building the IDSS on a BN would be inefficient and contrived. This is one reason 
why our theoretical development is not embedded in a BN framework, but instead develops a methodology 
which extends to BNs as a special case. 


4.3 Bayesian networks and chain graphs 

Currently, the most well developed and established qualitative structure is the BN. In our context the different 
panels would be asked to agree to a particular directed acyclic graph (DAG) of the BN. The responsibility 
for the delivery of different conditional probability tables (or more generally rows in these tables) would be 
partitioned out across the different panels. In this setting, global independence and local independence are 
sufficient for panel independence. In practice, these assumptions are almost always made in the more usual 


one-panel setting (Spiegelhalter and Lauritzen 1990). These make probability elicitations and computations 


much more straightforward than they would otherwise be. The separation of the likelihood for both discrete 


and continuous BNs under complete or ancestral sampling has long been recognised (e.g. Lauritzen 1996 


1999). 


Smith 2010), as has the closure to designed experiments (e.g. Cooper and Yoo 

Let Q be a BN with vertex set {Yi : i G [n]} and {Bi : i G [to]} be a partition of [n]. Suppose panel Gi 
oversees Y Bi = (Y)) je B . , i G [m\. In this setting panel independence implies 

tt(0 | d) = Ilte[ml niOBi I d), and f(y) = ILeM fi {vb, I Vn B .) , 


where = Ujes^ Bj, n j is the index parent set of Y), 
to a vertex Y), and 


Qj being the parameter associated 


fi (VBi | yn B .) = / Mt/Bi | VUb.,OB i)^i(0Bi I d)dO Bi , 


where ©^(d) is the parameter space of panel Gi, d G D. 
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Furthermore, provided that x l is a complete random sample from the same population as Y , the posterior 
density can be written as 

tt( 0 | £c‘,d) = Wi{0 Bi | Bi ,d). 

ie[m] 

For object-oriented BNs which construct a large, overarching BNs through copying parts of its specification 


(Roller and Pfeffer 1997) we would need the additional condition that all replicates of a probabilistic ‘object’ 


were the responsibility of a single panel. But since such replicates are asserting that the parameters of certain 
aspects of the model are identical this condition would, in practice, normally be satisfied. For a chain graph 


(Wermuth and Lauritzen 1990 Frydenberg 1990 Lauritzen and Richardson 2002), provided that all the 


probability specifications concerning variables in a given chain component are always the responsibility of 
a single panel, the assumption of prior panel independence can be made consistently and is natural. Again 
under complete or ancestral sampling the likelihood also separates over the panels, which was exploited for 
the combination of expert judgement in Faria and Smith (1997). 

Although some common BNs are often coded as discrete or Gaussian and supported by a conjugate 


analysis, this is not a requirement for their definition via the semi-graphoid axioms (stated on page 381. 


For example any panel’s distribution can be extremely complex, involving many latent variables which are 
later marginalised out to provide the required, sometimes non-parametric, posterior distributions over many 
or even an infinity of parameters. Usually then posterior distributions will not have an explicit algebraic 
form, but constitute large sample approximations from the complex system. In our context it would be 
extremely cumbersome, even in small scale problems, for a panel to communicate such information in its 
entirety to the system. However, in a rather different context of addressing the challenges of inference 


using massive data sets, Bissiri et al. (2013) have already noted that only certain moment are needed for 
the calculations of expected utilities for certain classes of utilities. This property can be exploited when 
considering an overarching framework of a BN when one component of that system is very large (see below 
and e.g. Lauritzen 1992 Nilsson 2001). This is also the case for an IDSS. It is interesting to note that, 


in fact, propagation algorithms for general BNs in terms of lower order moments and general distributional 
assumptions have been known for a long time. 


4.4 Decomposable graphs and cliques of panellists 

One class of problems where the sorts of conditions needed for separation can be fierce is when the overarching 


model is an undirected graph (Lauritzen 19961. From Dawid and Lauritzen (1993) we know that a sensible 


way to build panels would be by letting each oversee a clique of an agreed decomposable undirected graph, 
since distributed inferences can then be carried out for complete observations. However, now two different 
panels overseeing cliques adjacent to one another will both have responsibility for parameters of the margins 
of variables lying on the separator. Therefore variational independence does not hold, panel independence 
is broken and there is a danger that the system is not well defined. Two panels need to agree the same 
prior over this separator, for example, ensured through imposing into the CK-class that the centre’s beliefs 


are strong hyper-Markov (Dawid and Lauritzen 1993). Even then, if the two adjacent panels update their 


beliefs autonomously with different sets of information, then there is no guarantee that the resulting two 
posterior distributions of the two different panels will remain hyper-Markov. 

One simple solution to this problem, when the graph is decomposable, is to give precedence to one 
panel’s information about a particular separator and ignore all others’. This is equivalent to selecting a 
BN representation of this decomposable graph where responsibility for a particular separator is delegated 
entirely to the panel delivering the parent clique probabilities. The other panel is then only responsible for 
delivering the probability judgements about its clique probabilities conditional on the values of the separator: 


sec related issues associated with meta-analyses of incomplete data (e.g. Massa and Lauritzen 2010 Jirousek 
and Vejnarova 2003). In these sorts of contexts the standard Bayesian paradigm may well not be ideal. More 
expressive inference, perhaps using belief functions (Shafer 1976]), might be more appropriate to represent 
the composition of beliefs in such settings. 


23 





















































4.5 MDMs and uncoupled dynamic BNs 


BN time series models, such as the general dynamic BN (I 

Coller and Lerner 

2001 Murphy 

2002 

) or the dy- 

namic matrix-variate graphical model ( 

Carvalho and West 

2007), useful for modelling different components 


of the system, do not exhibit the required distributive properties, so these do not provide good frameworks 
for the overarching system. However, the class of MDM models (Queen and Smith 1993) does (see, for 
application of this class, Anacleto et al.| p013| |Costa et al. 2015| Queen 19941. This essentially models 
a multivariate time series {Y(t)}tez >1 = {Yii(t) : i £ [n]}* 6 z >± as a DAG whose vertices are univariate 
processes and its topology remains fixed through time. The univariate time series are modelled with a 


1997), where the regressors are the contemporane- 
‘- 1 ), 


regression dynamic linear model (West and Harrison 
ous time series. Importantly both the predictive, f{y(t) | y t_1 ), y t_1 = (y(s)) s e[t-i], and conditional, 
f(y(t) | 0(i),y t_1 ), distributions of an MDM factorize accordingly to the underlying DAG. This ensures 
distributivity. Specifically, for i £ [n] we assume: 


Yi(t) = di(t)Fi(t) T + Vi(t), 
0(t) T = G(t)0{t-l) T + w(t), 
0(l)|/°~(m(l),C(l)). 


«i(t)~(0,V r i (t)); (4.2) 

™(f)~(0,W(f)); (4.3) 

(4.4) 


Here F t (i) £ K Si , s t £ Z>i, is a vector of regressors, known functions of Y^ and Y*~ . The parameter 
vector 0(t) = (0i(^))*e[n] £ R s has elements 0i(t) £ R Si , where X^ie[n] s * = s - The observation error Vi(t), 
that can be either assumed known or unknown, is univariate and Vj(t) £ K>o is the observational vari¬ 
ance, often assumed constant through time. The known matrices G(t) = blockdiag(Gi(f),..., G n (t)) and 
W(t) = blockdiag(H / i(t),..., W n (t)) have dimension s x s, where Gi(t) and Wi(t) are the x s* state evo¬ 
lution matrix and state evolution covariance matrix for 0,(t), respectively, i £ [n]. The s-dimensional 
vector w(t) = (tUj(t)) ie r„] has elements Wi(t) of dimension s,; called system error vector. The errors 
vi(t ),..., v n (t), wi(t ),..., w n (t) are assumed independent of each other and through time. The vector 
m( 1) £ R s and the s x s covariance matrix C(l) = blockdiag(C”i(l),... ,C n ( 1)) are the first two moments 
of the distribution of 0(1) | 7°, where 1 0 represents the initial information available. 

To illustrate the features of the MDM model, consider the DAG of a naive food supply model in Figure 
p^and suppose panel Gi oversees the time series {y i (t)}. yeZ>1 for each decision d £ D. It can be shown that 
tire joint distribution of {Y t . Q t ) given the past factorizes as 


| P \d) = ) | 1 )7r i (0 i (t) | I* \d). 

<6 [4] 

So note the critical feature that the panel parameters are independent of each other at every time point 
conditional on the past. As mentioned, the predictive distribution also factorizes accordingly as 


f(yt I y* x ) = n 

*e[4] 


(4.5) 


where 


9t,i= fi(yi(t) I yn^y: 


* 1 0i(t))iTi(0i(t) I Vn^y* 1 , d)d0i(t). 


Although the predictive joint distribution of Y ( t) is not Gaussian, indeed its distribution can be very 
complex, the joint moments of its predictives are straightforward to calculate and are polynomial functions of 


various posterior moments that can be autonomously calculated by the different panel members (see Queen 
for an explicit derivation of some of these recurrences). As discussed in Leonelli and Smith 


et al. 

2008 

(2015 

, in t 


in this framework, expected utilities are often functions of these moments. Therefore, panels need 
only to deliver to the system a few summaries, making the calculation of expected utility scores very fast. 

Note that for modularity to hold we do not need any distributional assumptions other than panel inde¬ 
pendence, but simply, as for the BN, that the individual component processes can be partitioned so that 
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Production: Yi(t) 


Distribution: Y^t) 

-A 

Availability: Y^{t) 

-A 

Health: Y 4 (t) 


Figure 2: A naive DAG for food security. 


exactly one panel is responsible for the development of that process, given its parents. This is because dis- 
tributivity derives only from the assumed conditional independence structure and not from the distributional 
assumptions about the parameters or conditional distribution of responses given parents and these param¬ 
eters. So instead of components being dynamic regression models, they can essentially have an arbitrary 
structure, provided they give an explicit distribution for the response variable given the parameters and the 
configuration of the parents. 


5 Panel summary recursions for a dynamic IDSS 


5.1 An instance of a dynamic CK-class 


In Leonelli and Smith ( 2015[ ) and |Leonelli| ( |2015| ) five general structural properties were discovered which 
arise as a consequence of the axioms of Section [2] We were able to show that these properties ensured 
that the expected utilities needed by the IDSS to score options could be calculated using various tower 
rule expansions. Critically, we showed that each term in these expansions, usually a polynomial, could be 
calculated autonomously by a single panel, once it had communicated with its neighbours, so the required 
expectations then could be calculated through appropriate message-passing between constituent panels. 
Many of the classes of models we discuss there satisfy the conditions needed in this paper. The general forms 


of the tower rule formulae given in Leonelli and Smith (2015) and Leonelli (2015) needed to identify expected 
utility maximising decisions, although straightforward, are very general and rather technical. So here we 
restrict ourselves to illustrating how these message-passing algorithms might work for the elementary system 
given in Figure [2] 

Thus suppose the space of possible decisions, D, is sufficiently small for it to be plausible to evaluate the 
expected utility of every policy separately, one at a time. Without loss, for simplicity we drop the d £ D 
index, and present the formulae needed for the expected utility of a fixed decision to be calculated. Let 
(P T = (A(1) T ,..., Y(T) t ) be a random vector observed over a finite time horizon T where Y(t) is the 
vector at time t £ [T]. Let Y(t) T = (Y’ 1 T ,.. ., Y^), where Yi(t) is under the responsibility of panel Gi, 
and Ti(t) T = (lP(t),..., Yf(t)) be a vector of variables observed over r different locations in space, i £ [4], 
t £ [T]. Suppose that the family U in the CK class demands that the delivered utility function is linear over 
time, space and panel index. It follows that U must take the form: 




ie[ 4] te[T] ;e[r] 

Assume further that the marginal utilities Utu = where 7 i{t) £ R. Here the criterion weights 

ktu are fixed constants, having been previously elicited. 

Assume panel G\ uses any dynamic probabilistic model of arbitrary complexity, able to deliver as output 
the first two moments of the distribution of Y\(t), t £ [T], Let 07 (f) and let C\(t) denote the mean and 
covariance matrix, respectively, of Y\ (t). where 


Ci(t) = [ cf s (f) ] 


p,s 


Also let a\{t) and c[(t) denote the mean and the variance of Yj(t) and C\{t) = (c}(t),..., c[(t)). 

The second panel, G 2 , uses as its model of Yz{t) r different MDMs. Equation (4.2) then becomes: 


Yl(t) = Y^tfO^ + v^t), v l 2 (t ) ~ 
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where 0 2 (t) is an r-dimensional vector of unknown parameters and = b l 2 (t). The system equations 

( 4 . 3 ) therefore become: 


e l 2 (t) = G l 2 (t) 6 l 2 (t-l)+w l 2 (t), 

where G l 2 (t) is a r x r matrix having known entries and 


w l 2 (t) ~ (0 ,W l 2 {t)), 


W l 2 {t) = 


w, 


2,1 


J i,3 


is a matrix. Here the panel has ascertained d l 2 (1) has a prior distribution with mean a l 2 and covariance C 2) 
where 


C< = 


1,1 
C 2 j 




Panel G3 also employs a simple linear MDM defined, for l £ [r], by the following observation and system 
equations 

Yj(t) = o l 23 (t)Yj(t) + v l 3 {t), e l 23 (t) = e l 23 (t- 1 ) + w l 23 (t). 

The errors 4 ( 3 ) and w\( 2 , 3 ) are assumed by G 3 to be mutually independent with mean zero and variance 
V 3 (t) and W 23 {t ), respectively. These variances are assumed to be unknown, but the panel provides a prior 
mean 63(f) for V 3 (t) and r l 23 (t) for W 23 (t). Prior means and variances are delivered also for the parameters 
6*23(1) and denoted as a l 23 and c l 23 . 

The final panel, G4, uses an emulator based on a complex deterministic simulator for each region; for our 
example, the household demography around a given region. This model might be a simulator of purchasing 
activities over the space of households, with runs covering a small sample from retail outlets in the region. 
The inputs of each simulator are y l 3 (t) together with other known constants, whilst its output is y\{t). The 
emulator is then formally defined as 

Yl(t) = m(Y 3 (t)) + e(Y 3 (t)), 

where m(Y 3 (t)) = 0 34 + O l 34 Y 3 (t) and e(Y 3 (t)) is a zero mean Gaussian process with covariance function 
= W 4 r l (Y 3 (t) — Y 3 (t) ), where r l is a stationary correlation function such that r z ( 0 ) = 1 agreed by 
panel G4 using their expert judgements (see e.g. Kennedy and O’Hagan 2001 ) for this emulator. From the 


emulator G4 is now able to calculate, in particular, the moments of his parameters which will be needed by 
the SB as 

■^(^04) = a 04) ^(^34) = a 34> ^(^04) = c 04j ^(^34) = c 34> ^(Wj) = r \. 

The algorithm, described in |Leon elli and Smith ( | 2015 ), enables us to customise a message-passing algo¬ 
rithm for the overarching structure defined. This works backwards both through the DAG and over time, 
from the last panellist at the last time point, and entails the computation of the function u t j , which includes 
all the terms that are a function of Yj(t) in the backward routine, and its expectation We will illustrate 
the algorithm over just two time steps, so start from panel G4 which oversees Y 4 ( 2 ). Here we set z~i2,4 = U, 
and let u 2i a = U — X);e[r] ^24/^24;. Panel G4 first computes u 2 ,4, using the formula 

U2A = U2 A - J2 fc 2 4*7i(2)E(Fi(2) 2 ) = ii 2 ,4 - ^ fc24z7i(2)(E(F 4 '(2)) 2 + Y(r 4 z ( 2 ))) 

26[r] 2e[r] 

= *2,4 - E 74 ( 2 )^ 24 / [rl + 2a(, 4 4 4 E(l3 i (2)) + 4 4 E(F 3 '( 2 ) 2 )] , 

ie[r] 

where Tq 4 = a l 04 + r\ + Cg 4 and t 34 = a l 34 + C34. 

This fu nctio n is then commu nicat ed to G3. Because of the topology of the DAG in Figure [ 2 ] under the 

([ 2015 ), u 2 , 3 = U2,4 + J 2 ie[r] u 23i■ So writing 4,3 = -^\{ 2 )k 2 ±i/r M a l 34 and 


notation in 


Leonelli and Smith 


*2,3 = EE kiiiUm + k211U2.11 + k 2 2 iU 2 2 i — 74(2)^2427 

ie[4] ze[r] 
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it follows that 4 / 2,3 is then equal to 


U2,3 = U2,3 - ]T(74(2)^24^4 + fc 23 /73(2))E(4(2) 2 ) + 4, 3 E(4(2)), 

le[r] 


where E(Y 3 ( 2 )) = a l 2 ^(Y 2 (2)) and E(Y 3 (2) 2 ) = ( a l 23 2 + r 23 (2) + c 23 )E(Y 2 (2) 2 ) + 6 3 (2)- Noting that S 2 ,2 = 
^ 2,3 + XweW ^ 22 /, rearranging, we obtain the equation 


« 2,2 — ^ 2,2 — 22 [[('ri(2)fc 2 4«T34 + ^23/73 ( 2 )) (4s + 4s(2) + c 23) + ^ 22 i 7 2 ( 2 )]®^( 2 ) 2 ) + ^ 2 , 3 a 2 3 E(^( 2 )) > 

ie[r] 

(5.1) 

where 

U2,2 = 22 5 2 klilUlil + k21lU21l — 74(2)^24/74 — ( 74 ( 2 )^ 24/^34 + ^23/73(2 ))^ 3 (2). (5.2) 

ie[4] /e[r] 

At this stage panel G 2 performs a marginalisation step computing 

E(F 2 Z (2)) = E(Y 1 (2) T )G'(2)4, and E(Y 2 Z (2) 2 ) = E(F 2 W + V(K*(2)), (5.3) 

where 

E(Y 2 (2 )) 2 = 4 T G'(2) T E(Yi(2))E(Y 1 (2) T )G'(2)4 (5.4) 

and 

Y(4(2)) = V(Y 1 (2) t 4(2)) + E(4(2)) = Y(Y 1 (t) T G'(2)4(1)) + E(Y 1 T fY'(2)Y 1 (t)) + b l 2 ( 2) 

= V(Y 1 (t) T G' (2)4) +E(Y1(2 ) t G' (2)G*G*(2) T Y 1 (t)) + E(Y 1 T W‘(2)Y 1 (t)) + b\ (2) 

= 4 T G' (2) t V(Y!(2))G' (2)4 + E(Y!(2) t (G'(2)G^G z 2 (2) t + 1Y'(2))Y 1 (2)) + 4(2). (5.5) 


Substituting the results of equations (5.2)-(5.5) into (5.1) panel G 2 can now compute u 2 , 2 - This function 
is then sent to Gi. Recalling that E(Yi(2)) == a 3 (2) and Y[Yi(2)) = Gi(2), we see that 


E(Yi(2) T (G£ (2)G*G* ( 2 ) t + 14(2))Y!(2)) 


= + 4’j)(4 (2)4(2) + 442)), 

71 £[r] 


where a* J is the entry in position (i,j) of A = G 2 (2)G 2 G 2 (2) T . 

By substituting these values into u 2 , 2 , panel Gi can therefore now compute the value U 2 ,i it needs to 
transmit. Proceeding in an analogous way after more laborious but straightforward substitutions we obtain 
the score U for this policy, namely the polynomial: 


u ( d ) = 22 22 tl (44(4 + 4(4(744 + 4(4(744 + 73474(4)) + 4™* (4 

te[2] /e[r] 

where 4(4 = hia\{t), r{(t) = a[{t) 2 + c[(t), 4(1) = 4s + (4s) 2 > 4( 2 ) = 4(1) + r 2 s( 2 ) and 

tLJ 2 ) = 4(2)4(2)+4(2)4+ 24 4 4 4 4 3 ai (2) T G'(2)4, 

T mix( 1 ) = 4(1)73(1) + 74(1) t 4 + 2ao4 a 34 a 23 a l(l) a 2i 

4(2) = 4 G l 2 (2) T (Ci(2) + ai(2)ai(2) T )G l 2 (2)a l 2 + b 2 (2) + ^ (A ld + wfyja i(2)4(2) + 4' J (2)) 

716 H 

4(1) = 4 T (Ci(2) + ai(2)ai(2) T )4 + 4(l)+ 22 4fj( a l(l)4(l) + 4’4i))- 

7!6[r] 

There are various points we would like to draw out of this example: 
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1 . even for simple systems like the one above, the required formulae are quite involved and incorporate 
uncertainty judgements in a subtle but appropriate way, unlike their naive plug-in analogues; 

2 . because of their symbolic integrity, the formulae to score different options and their associated message¬ 
passing operations can be hard-wired into a system just like a propagation algorithm can for a BN. 
These operations are usually simply substitutions, products and sums of known quantities and so fast 
to calculate; 

3. because these methods are symbolic, in particular, the same formulae but with different inputs can be 
used for evaluating the scores associated with different policies. So computations can be parallellised; 

4. the formulae only need the delivery of a small number of values from panels, making the necessary 
calculations feasible and quick once the overarching structure and utility class has been agreed. 


5.2 UK Food Security 


In the previous section we illustrated how the algorithms needed for an IDSS can be calculated using the 
structural information defining the underlying process. We now finish our discussion by outlining our ongoing 
work to produce such an IDSS to support UK local government officers to assess the impact that various 
government funding reductions might have on the food poverty of citizens for whom they have a responsibility 
of care. 

The first part of this task was to elicit from the decision-makers the purpose of the IDSS, the range of 
decision that were open to users, what the attributes of interest were, and how these were currently measured. 
In the context of Warwickshire County Council, the decision space needed to be able to express how and what 
services to provide in order to ameliorate the negative effects of government cuts. Through a preliminary 
decision conference the clients identified that the bad effects of food poverty potentially manifested itself 
through its impacts on educational attainment, health and social incohesion expressed as the potential for 
inciting food riots. In constructing a utility function based on these attributes, it appeared appropriate to 
assume value independence (Keeney and Raiffa 1993). Let Yj denote measures of educational attainment, 


Y 2 denote measures of health, Y 3 denote measures of social unrest and Y 4 denote cost. An appropriate family 
of utility functions was then found to have the form: 


U(y) = ^2 1 - exp(-Ciyi)) + fc 4 (a + by 4 ), 

<6 [3] 


(5.6) 


where y = ( 2 / 1 , 2 / 2 , 213 , 2 / 4 ) and whose parameters (a, 6, Ci, c 2 , c 3 ) were then elicited. 

From the elicitation we could see that any structural model had to support the prediction of (Lj , Y 2 , Y 3 , 14 ) 
as these evolved into the future as a response to each potential policy choice. Figure [3] gives a schematic 
of both the main features of the process we capture in this framework and the constitution of panels over 
the system. The panels constituted for such an IDSS will often be chosen to mirror the panels that are 
already constituted for similar purposes, e.g. in the UK, the Office for Budget Responsibility, HM Treasury 
and The Confederation of British Industry all produce economic forecasts (Economy). Data sources on the 
different components are extensive and can be incorporated into the model by appropriate panels, although 
the existence of some gaps in the process makes the elicitation of some probabilistic expert judgements 
essential. Demography and Socio-economic status (SES) distributions are available from the Office for 
National Statistics (ONS); costs of housing, energy and general cost of living (CoL) via the consumer prices 
index (CPI); food trade (imports and exports) and farming yields from DEFRA; supply chain disruption 
and overall food availability from DEFRA, the food standards agency and the food and drink federation; 
access to credit from the Bank of England; household disposable income from ONS and Food costs via 
the cost of a typical basket of food, as used in the UK consumer prices index (jONSl 2013). We need to 


build separate probabilistic models of supply chains for various diverse categories of food which will then 
combine together to provide forecasts of the costs over time of typical baskets of food for different categories 
of impoverished households. At the time of writing, we have now elicited and built prototype versions of 
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Figure 3: A plausible schematic of information flows for the modules of a UK food security IDSS. KEY: Economy: UK economic 
forecasts; Demog.demography; Farming: food production; SES: Socio-economic status; Credit: access to credit; CoL:cost of 
living; Food Avail: Food availability; Supply disrup: food supply disruption; disp. inc.: household disposable income. 


such dynamic probabilistic models for about 40% of these categories of food (Barons et al. 2015). Once 


these are in place, the effect of different scenarios and strategic policies, to be elicited, can then be examined 
via their composite utilities. In reality, some of this information is very rich. For example, the insurance 
industry has extensive data on types and locations of all the food in transit, data which is fast-moving in 
time, so the datasets informing these systems can be huge. But also note that typically we need only short 
summary statistics for the purposes of the IDSS. The distributivity of the system we build also means that 
different types of processes can be audited separately using analogies of the tower rule recurrences described 
in the last section. These can, if necessary, be adjusted through discussions with relevant domain experts. 
In this way, the system becomes a vehicle through which understanding and appreciation of the input of the 
diverse components of the system by the user can be realised. 


5.2.1 Some illustrative detail: GCSEs and free school meals 

To illustrate how the process works, we conclude by working through in slightly more detail a simplified 
version of the explicit calculations used to make for the examination of inputs concerning just one variable 
of interest, educational attainment as measured through Impact Indicator 8 (118), the percentage of pupils 
gaining 5 or more GCSE qualification, including maths and English, passed at grade C and above. Children 
from low-income households are entitled to free school meals (FSM). A smaller proportion of pupils eligible 
for FSM achieves this benchmark compared to their peers who are not eligible. 

The panel with expertise in the expected distribution of household incomes will use an appropriate 
probabilistic model, e.g. an MDM, to provide appropriate forecasts of numbers entitled to FSM together 
with measures of uncertainty, reflecting the quality of the experimental evidence on the relationship. This 
enables the next panel to predict the expectation and variance of 118 under differing policy implementations. 
In this particular case, since this is a terminal panel in the IDSS, its output is an attribute of the utility 
function, i.e. a reward. These assessments are donated to the panel calculating the effect on the future 
predictions of the utility function. 118 is calculated using the simple formula: 

*i(*) = v > (5J) 

v eM / 
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where Yj (t) £ [0,1] is an indicator variable on whether or not the individual achieves the target at time t, 
the next time point. Assume an MDM holds given by: Yj(t) = Aft) — Bj(t)Y*(t) + eft), where Y*ft) is a 
binary indicator of eligibility for free school meals and the parameters Aft), Bft) have distributions derived 
from the previous time point. Then the polynomial recurrences E(T L (t)) and V(Yi(t)) can be derived as 
given in Section |4j 

Within our actual system the following calculations need to be made numerically. But as a simple 
illustration in order to illustrate some of functionality of the IDSS in a simple way, suppose that the reward 
is Gaussian. Then it is easy to check that the corresponding expected utility function given by 

U(d) = 1 - exp c{E(Yi(t) | d) - icV(Yi(t) | d)}) • (5-8) 


This is clearly maximised when E(Yi(f) | d) — |cV(Yi(t) | d) is maximised over d € D. So we can see here 
a clear trade-off between the predicted effect of a decision against its associated uncertainty. In particular, 
it is apparent that we have automatically and explicitly accommodated into the scores the fact that effects 
must be militated by an associated uncertainty. Note that when the reward is non-Gaussian, the expression 
(5.8) will be a function of the cumulant generating function evaluated at different points. The education 
panel can, of course, simply calculate these outputs from its probabilistic model to produce the necessary 
summaries. 


5.2.2 Using the IDSS to feed back information 


Warwickshire Policy A 


Warwickshire Policy B 



Prosperity and deprivation Prosperity and deprivation 

Figure 4: UK food security: illustration of the use of regional maps for decision support. Here an indicator of prosperity shows 
deprived areas in red. A clustering of deprivation (left map) is a risk factor for social unrest, such as food riots. Therefore, 
policies which specifically reduce and fragment large areas of poverty (right map) are to be preferred. 

Presently, the decision makers are presented with current and past data concerning relevant issues dis¬ 
played in graphs and maps within written reports. But these contain no annotated predictions of impact 
on vulnerable communities of future events or the impact of central government changes in regulation. Nor 
are evaluations given of the likely effectiveness of different implementations of changes, i.e. policy options. 
We are currently transferring the output of our analyses to the same types of graphs and maps, but now 
concerning predictions of the impact of future controls. In Figure [4] we illustrate how their familiar de¬ 
mographic regional map can be used to display ‘hot-spots’ for food riot potential that might occur under 
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certain policies. Here the decision centre can immediately see how a decision is predicted to have a dramatic 
deleterious effect in the more deprived areas of the region, and because of this, increase the risk of rioting. 
We have shown that, despite the complexities of the food security system, we have been able to establish the 
CK-class for this problem, to identify how to make a sound and distributed IDSS using expert judgement 
based on suitable models, data and elicited domain knowledge, to accommodate rigorously the uncertainty 
in the information and to provide the SB with an appropriate visualisation of the IDSS outputs to make 
them usable. 


6 Discussion 


In this paper we have been able to demonstrate that a formal and feasible IDSS can be designed to enhance 
a decision centre’s decision-making capabilities. We noted that systems with an underlying clear causal 
directionality are especially amenable to this type of support. What we usually need is that the functionality 
of the support to be precisely defined so that a suitable overarching framework can be agreed by all parties 
- several illustrations of these are given above - and that the IDSS has available sufficient numbers of expert 
judgements to inform it. 

Then it will be possible to continually update the system with the best of the evidence and do this in a 
distributed way that ensures it remains coherent, sound and feasible. Furthermore, the calculations of scores 
of policy options will often depend on the expectations of a few functions donated by autonomous panels, 
communicated to the other panels in simple ways so that uncertainties are appropriately incorporated into 
the analysis. 

Of course, as for any class of statistical model, the validity of the assumptions underpinning its evaluation 
needs to be checked against both domain knowledge and empirical evidence as this becomes available. As we 
stated earlier, under the conditions we derive above diagnostic checks of the various component models can 
be devolved to the responsible panel. But what if the structural assumptions within the CK class appear 
from data to be violated even though the individual components themselves perform well? 

This is a challenging issue and obviously depends critically on both the domain on which the IDSS applies 
and the nature of the information sets. In crisis management systems like RODOS, as an incident unfolds such 
structural reappraisal would be very difficult to enact in real time. However, within integrating systems for 
policy decision support, like the food security system discussed above, such data-driven appraisal and revision 
may be possible. The obvious way to proceed would be to set the predicted consequences of policy choices 
made using the system against their predictions. A natural framework for applying these techniques would 


be the prequential approach (Dawid 19841 where model predictions of immediate consequences of enacted 


policies are compared with what was actually observed to happen - for example in terms of comparisons of 
observed and predicted marginal utility scores - a methodology we plan to fully develop in the future. 

Our focus here has been on networking probabilistic models. However this focus was determined mainly 
because probabilistic decision support tools are the most widely used of those that accommodate uncertainty 
in some way. We accept there are other modelling tools - for example those based on linear Bayes or belief 
function inferential schemes - which also have their own formal structures and using similar technologies can 
also ensure coherent and speedy integrating decision support. 

We have noted above that many of the computational forecasting formulae can be expressed algebraically 
where the systems of equations can be deduced from the elicited overarching system. So, in particular, 
computer algebra can be used to code up these equations which can then be examined for their algebraic 
properties. As well as guiding the design of the network architecture of an IDSS, issues like the robustness 
of the systems to variation in inputs can be formally and systematically analysed using these tools. We have 


already made some tentative steps in this development (see Leonelli etah 2015b). 

Of course, in some situations the likelihood separation we need to ensure the enduring formal distributivity 
of a formal IDSS will break down. Then we will need to fall back on approximate inferential methods that 
preserve distributivity. However preliminary results suggest that sensible approximate methods - whose form 
is similar to variational Bayes methods - can still work effectively provided that statistical diagnostics are 
employed to check that the family of models described by the overarching posited structure remains plausible 
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in the light of the information available. These results will be reported in due course. 

We believe that, with the increasing demand for complex evidence based models the demand for inte¬ 
grating decision support systems can only increase. If such systems do not encode uncertainty intelligently 
then as statisticians we know that these will be liable to grossly mislead. But as we demonstrate here it 
is straightforward to appropriately encode uncertainties into such a framework and in this way to properly 
appraise the available options. 
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7 Appendix 

7.1 Proof of Theorem 13.11 

Fix a d £ T> and for simplicity suppress this dependence and the time index t. We need show that under the 
conditions of the theorem at any time t and under any policy the SB will continue to hold panel independent 
beliefs, i.e. that, for i £ [m], 

o, x e t - | /+, (7.i) 

and that when assessing 6 i, she will only use the information Gi would use if acting autonomously in assessing 
the information she needs to deliver, i.e. that 


Bi±I+ | Jo.Jii. 


(7.2) 


This is then sufficient for soundness and distributivity. Because even if all panellists could share each other’s 
information then, given all panel beliefs, they would come to the same assessment about the joint distribution 
of the relevant parameters: that all panel subvectors are mutually independent of each other and that these 
margins will simply be the margins of the associated panel were they making decisions autonomously. 

The proof simply uses the semi-graphoid axioms of conditional independence as stated in Smith (1989): 
Symmetry: For any three vectors of measurements, X, Y, Z : 


X±Y I Z&Y±X I Z 
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Perfect composition: For any four disjoint vectors of measurements, W, X,Y, Z : 

X ± (Y,Z) \ W X ±Y \ (W,Z) b X ± Z \ W 

We start by proving the panel independence condition in <0>- Note that from common separability in 
equation (3.41 it follows that 


o, x e, 


In- 


which combined with the separately informed condition in equation (3.2) through perfect decomposition and 
using the symmetric property of semi-graphoids axioms gives 

0, • In-O, | lo- 

Using again perfect decomposition and symmetry, it follows that 

Oi X 0 t - | InJ 0 . (7.3) 


Now the cutting condition in equation (3.3) together with equation (7.3) implies by perfect decomposition 
that 


Iq j lii - 


0, X /,,0, 

Then again by perfect decomposition we have that 

0i X 0^ — | , liij 7*. 

Since la is a function of /* the above expression is equivalent to 

0, X 6 ,- | 7 0 ,/*. 


(7.4) 


(7.5) 


The delegatable condition in equation (3.1) can be written as 

1 + X 00 , | Jo, /*, 

so, using perfect decomposition, it follows that 

1+ X 0: | Iq, I*,Oi-■ 


(7.6) 

Combining via perfect decomposition equations (7.5) and (7.6) and using the symmetry property we have 
that 

0, - I--0, h-h- 

So using again perfect decomposition it follows that 

0% X 0, I Io, I*,I+i 


which, since /* and Iq are functions of / + , can be written as equation (7.1). This shows that panel indepen¬ 
dence directly follows from the four conditions of Definition |3.2| 


To show that equation (7.2) holds, note that another implication of delegatability in equation (3.1) by 
perfect decomposition is that 


O i ±I + \I 0 ,I ii ,I., (7.7) 

where again we used the fact that is a function of /*. Now noting that by perfect decomposition equation 

0, X h | Jo,/*, (7.8) 


(7.4) implies that 


it follows from equations (7.7) and (7.8) by perfect decomposition that 

0i X X|_, 7* | Iq , I a , 


which, since /* is a function of 1+ is equivalent to equation (7.2). 
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7.2 Proof of Theorem 13.21 

Fix a policy d £ T> and suppress for ease of notation this dependence. Under the initial hypotheses, by 
Theorem 13.11 

JL ie [m] I 7g ! I* i 

implying that the prior joint density can be written in a product form 


7r(0) = 7 Ti(0i). 

iG[m] 

It follows that under this admissibility protocol 

n(6 \x t ) = 7^(0*, 

i£[m\ 

where from the form of the likelihood above 


7Ti(0i,**(**)) oc li(0i | ti(x t ))ni(6i). 


By hypothesis tt, (0,, tiix 1 )) will be delivered by Gi and adopted by the SB. So the IDSS is sound. In 


particular we can deduce, through the definition of panel separability and the results of Theorem 3.1 that 


JL ie[m]@i I 


J- ie[m]Qi I Iqj I*- 


So we have separability a posteriori. Note that in the notation above 


and, for i £ [to], 


{ 44 } = { 44 ,*i(*‘)}- 


Since by definition is known to G'j, i £ [to], the system is also delegatable. Finally note that if 


i(o i ®*) ± n 1 ^ i 

iG[m] 


on a set A of non zero prior measure, then the conditional density tta(0 | a;*) on A will have the property 
that for all 9 £ A 

(0 | a; 4 ) ^ n A ,i(Oi,ti( a; 4 )), 

ig[m] 

where tta,* denotes the density delivered by panel Gi for the parameters it oversees in A. This means that 
panel parameters are a posteriori dependent and so in particular the density determined by the margins is 
not sound. 
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